How to use the Tableau Prep to clean the dataset

CLEANing the DATASET

  • The cleaning step helps to clean omitted data, that will show up as NULL values, define new parameters by using calculation based on tables, change the data type of each field or column, or rename the fields.

  • In fact, cleaning step is the first step in the workflow to prepare a suitable dataset for visualization in Tableau Desktop.

  • In order to add cleaning step, use the plus sign and choose Clean step.

  • After successful creation of Clean step, we should see the fields and their distributions for their values.

The first thing that we notice here is that there is a NULL value in the fips field or column. If we right click on that NULL, we could actually clean our dataset and exclude the NULL and all other associated data to it to make sure there is no NULL value present in the final dataset sent to Tableau Desktop.

The Clean step in Tableau Prep has a smart recognition of header names and guessing their data types. As a result, it gives some useful recommendations. For this dataset, we have one recommendation to change the data type of the state field to State/Province which indeed is a correct recognition of data type for this field that represent the US states.

  • Another useful option in the Clean step is to Create Calculated Field. With this option, we could create customized fields or columns by writing formulas. The main goal of this training is to be able to calculate moving averages of the cases and deaths of the COVID-19 for different counties of South Carolina state. We would calculate the moving averages in three different time windows of:

  1. Weekly: we need to truncate the timeline or date field into weekly chunks to be able to calculate the moving averages on a weekly basis.

  2. Biweekly: we need to truncate the timeline or date field into biweekly chunks to be able to calculate the moving averages on a biweekly basis.

  3. Ten days window: we need to truncate the timeline or date field into ten days chunks to be able to calculate the moving averages on a ten days basis.

  • Each of these calculations for weekly, biweekly, or ten days window could be done by using Create Calculated Field. The formula for each time window is provided here:

  1. Weekly: DATETRUNC("week", [date], "monday")

  2. Biweekly:

IF DATEDIFF('week', DATETRUNC('week', [date]), #2020-01-01#, 'monday') % 2 = 0 THEN

DATEADD('week', 2, DATETRUNC('week', [date])) -1

ELSE

DATEADD('week', 1, DATETRUNC('week', [date])) -1

END

  1. Ten days window:

IF INT([date]) % 10 = 0 THEN

[date]

ELSE

[date] - INT([date]) % 10

END

  • In the above formulas for weekly, we truncate the date field into weekly chunks by using built-in function of Tableau as DATETRUNC where the first argument tells the chunk, which is week here, the second argument tells the name of the field to be truncated, and the third argument shows the first day of the week.

  • For the biweekly formula, we need to tweak this function to only keep the weeks that their number is even. For the ten days windows we divide the days into ten days chunks:

Also, in the Clean step, it is possible to change the data type. For weekly and biweekly fields calculated based on the formulas above, the Tableau Prep automatically find their types as Date & Time but we want to change it to Date.

The packaged flow created in this section is shared here for reproducibility: https://bit.ly/3jAgcoi