4 Data Transformation
For data with a clear structure, there is a set of five transformation techniques we need to master. In this lesson, we'll introduce them step by step.
Last updated
For data with a clear structure, there is a set of five transformation techniques we need to master. In this lesson, we'll introduce them step by step.
Last updated
Data is the new oil, at least according to the mathematician :
“Data is the new oil. Like oil, data is valuable, but if unrefined, it cannot really be used. It has to be changed into gas, plastic, chemicals, etc. to create a valuable entity that drives profitable activity. So, must data be broken down, analysed for it to have value.”
If we take this analogy seriously, the data, like oil, needs to be refined and turned into something of value. Two important tools for refining data into a valuable output are data transformation and data visualization, both of which are the main focus of this book. In this part of the book, we first need to learn how to transform data from one form into another so that we can apply .
To master data transformation, we have to master five basic transformation techniques that work in structured data. We always start with a given data frame that we want to change into something else. In doing that, we typically want to do one of the following:
Remove variables we don’t currently need (or specify those we do need). You will learn how to do this in Select Columns.
Remove any records we don’t currently need (or specify those we do need). We'll introduce ways to do this in Filter Rows.
Change the order of the records. That's an easy one covered in Sort Rows.
Add new variables we require, but that don’t exist yet. We'll learn the general techniques and look at some examples in Add Or Change Columns.
Summarize many records into one or a few numbers. That's what data analysis is all about, and in the lesson Summarize Rows we'll look at concrete examples.
The following figure illustrates the schematic working of each of the five transformations: