HMIS data is often dynamic and we sometimes encounter hand-entered data. Before that data can be analyzed with any consistency, it must be cleaned and standardized.
Though you can connect Tableau Prep to Google Sheets, we're going to start simple - download the Targeted Outreach data as an excel file to your computer.
Open the file in excel and take a look at the original data. What sticks out to you? Do you notice any complications? Right off the bat, do you have a guess which organizations used the web-form and which used paper surveys?
Open Tableau Prep, click Connect to Data, then click on Microsoft Excel. Open your downloaded Targeted Outreach data and double click “Round 1” to drop your data source into the workflow.
Click the ➕ next to the data source and add a “Clean Step”. The Clean Step allows you to see the distribution of every value in a given column through through blue 'summary' and grey 'detail' histograms.
Take some time here to explore the Clean Step by hovering over different data points and seeing what you can learn about each field.
Scroll through the data and the field breakouts until you reach "Gender" and "Race".
What do you notice about the data?
When data representing the same piece of information are not labeled or spelled the same way, your analysis will be fractured across those inconsistencies.
Consistent data labels means consistent spelling, capitalization, and format (e.g. percentages or date formats).
Document labeling standards and train your staff to use consistent labels.
Use gentle reminders if you notice consistent errors.
Use data validation rules or list-pickers on your front end data collection or data entry systems when possible.
Use the Clean Step in Tableau Prep to group or fix inconsistencies before they reach your analysis.
Create data quality dashboards to incentivize clean data.
Let’s work on cleaning up “Gender”. Ctrl+clicking each of “F”, “femael”, and “Female”, right-click, and select Group. Alternatively, select a data point and click the Edit Value button to manually relabel the data. Repeat to group together “M” and “Male”.
Do the same with the “Race” field until our data is clean and consolidated.
Now that our data has consistent data labels within each field, our visualizations won't be fractured by data entry errors.
So much easier than scrolling through row-by-row and doing endless find-and-replaces! The best part about Tableau Prep is that every workflow can be saved and re-run whenever you'd like. That means if more data is entered into the data set, any cleaning you've saved will already be done.
On that note, save your workflow before moving on!
In order for data to be aggregated the way you want, it must have consistent data types.