Another common occurrence with hand-entered data is users entering data in formats that are inconsistent with what would be recognized by calculations. In order to make the most of our data, the data type must be consistent.
One of the most common stumbling blocks when trying to import raw data into Tableau is that the data shows up but it isn’t acting right when you try to aggregate it. Often, the reason for these difficulties is your data doesn’t have consistent data types within a field.
Still in the Clean Step, scroll back to the left until you find “Outreach Time”. This summary histogram represents when the data was collected by our staff across the first round of outreach. The range of the histogram lets you look for outliers - here we can see that the first run of the program went from February through April.
You can tell by the small calendar and clock icon in the top left corner that Tableau recognizes this data as “Date & Time” data.
That the data is represented by a blue summary histogram means Tableau recognizes this as a continuous variable - all of our data is numerical and of a consistent data type.
Now let’s scroll to the right to look at “Birthday”. What do you notice about this field? Why do you think the data is showing up as a grey detail histogram rather than a blue summary histogram?
Consistent data types means all data within a single column are formatted the same. Dates have the same date format (e.g. all mm/dd/yyyy), all numerical data is just numbers (e.g. all 1, 2, 3 instead of ‘one’, ‘two’, ‘three’) and all categorical data is text.
Tableau Desktop will try and make its best guess about data types but that often means mismatched data types will be either replaced with null or the whole column will be treated as text/string.
This would prevent you from aggregating your data accurately (i.e. how would a computer average 13 and ‘fifteen’?).
Document labeling standards and train your staff to use consistent labels.
Use gentle reminders if you notice consistent errors.
When designing data collection tools, set up data validation rules to restrict input to the right data type.
Use the Clean Step in Tableau Prep to look for mismatches and the Data Type Dropdowns to ensure your data is the right data type.
Looking at the "Birthday" field, Tableau is treating what looks like date data as text/string as indicated by the Abc icon in the top left of the field. This means, as is, we won’t be able to do any sort of calculations like calculating age.
Scroll down to the bottom of the list to find the mismatched data. Looks like someone hand-entered the data and typed out the birthdays rather than using the prevailing date format.
Use the Edit Value button we learned above to rewrite these dates in a m/d/yyyy format.
Once the data is all in a m/d/yyyy date format, change the data type of the field by clicking on the icon in the top left corner and selecting Date.
To change the visualization to the blue summary histogram view, click the three-dot menu and select View State: Summary.
Now that the data is consistent and the right data type, Tableau recognizes that it can perform calculations on it (in this case, binning for the histogram) and we can view our data in a blue summary histogram format.
Does this data look right? What’s the range of the histogram?
You may notice there’s a single record with a client’s birthday listed as somewhere between 1900 and 1910.
If you click the sliver of data right after null, you can isolate the row of data and view it at the bottom of the screen.
Looks like this may have been a typo! To fix the error, switch back to View State: Detail and use the Edit Value button or right-click, Edit Value. Perhaps they meant 2003? Though normally it would be best practice to consult the staff member who entered the data, for the sake of the activity, edit the year to 2003.
After fixing the data, go back to View State: Summary - the histogram now looks a lot more logical.
Take a look at the rest of your data to make sure your data type is correct. Dates and numbers should be showing up as blue summary histograms and text/string should be grey detail bar graphs.
Spoiler, that’s all of the errors in the data!
Birthdays are useful but they are hard to analyze by themselves. Use the Create Calculated Field button or click the three-dot menu in “Birthday” to create a new “Age” field.
Hint: Use the DATEDIFF() and TODAY() functions.
Now that our data is clean, we just need to reshape one aspect of our data by making it tall vs. wide.