A data set that's structured to be intuitive for humans often doesn't make much sense when read by a computer. Shaping data to be "tall" early on as opposed to "wide" pays big dividends for analysis later on.
Another common issue that comes up when working with data (especially if you are trying to visualize data exported for a "report") is that while data is often structured to be intuitive and easily human-readable, our computers have a much harder time making sense of the structure. For example, reports that show different population counts over time are often laid out with the population on different rows and the monthly counts from left to right. Unfortunately, computers are not great at parsing out how to read a spreadsheet - by default they read row-by-row.
To solve this, we often have to pivot our data.
It is data-structure best-practice to have “tall” data as opposed to “wide” data. In tall data, each column fully and uniquely represents a single dimension. In wide data, multiple columns represent variations of a single dimension. This ties into a fundamental database structure concept called database normalization.
This data is considered wide because it has three different variations of a single dimension - in this case, three different months - split out across three columns.
Tableau treats those columns as completely independent pieces of information as it doesn't understand there's a relationship between the three fields.
As a result, it’s tougher to build a viz that compares populations month-over-month.
Pivoting our data to be tall, however, means all of our data about the month is in a single column and the count of clients is also in a single column. The computer recognizes that the each different month is simply a variation of a single dimension.
Notice that each row now represents a single bit of data and "Veteran" and "Chronic" is repeated on every line. No merged cells!
With "month" information isolated in its own dimension, it's now much easier to analyze the data in Tableau and compare counts between months.
Ultimately, tall data is about having each row represent a single "unit" of data. In this example, the base unit of data is a population count so each row represents a single count - other times the base unit may be a client or a service provided.
Help your teammates design datasets and surveys to ensure tall data to start.
Structure data sets to be tall from the start.
If possible, restructure existing data to be tall.
Use the Pivot Step in Tableau Prep or the Pivot Function in Tableau Desktop’s data source view to shape your data from wide to tall.
Returning to the data, can you spot where multiple variations of a single dimension are broken out across multiple columns? What structure would make this data tall instead of wide?
There are five columns that break out which organization provided a service but contain the same information about what service was provided - aka wide data. If this data were imported into Tableau as is, each organization would have its own column and it would be very difficult to compare across organizations. In order to make this data tall, we need to use the Pivot Step.
Click the ➕ next to the Clean Step and add a Pivot Step.
In the Pivot Step, drag the five organizations into the Pivoted Fields space.
Rename the fields as Organization and Services by right-clicking the headers.
Looking at the Service fields, it looks like the vast majority of Service data is null.
Why is that? Why are there 122 records for each organization?
When the data was unpivoted, there were lots of blank cells for when a client didn’t meet with a given organization. Therefore, when Tableau pivoted the organization columns, all of those blank cells turned into null services.
Filter out the null data by right-clicking, Exclude.
Add another Clean Step to examine your data. You’ll notice that your data is simplified and each client only has one organization column.
In our Targeted Outreach data, each client is only visited once by one organization. What would this data set look like if it was pivoted to be tall? What else needs to be done to this data to be structured correctly?
Hint: What is the lowest level of granularity this data is tracking? Clients? Services rendered?
If everything looks clean, it's time to export the data for analysis!
To export your data, add an Output Step. Go through the output settings such as the save destination and file name. Under Output type, select what you’d like your data to come out - Tableau data source or csv.
Click the go button and you’re done! Reminder: save your workflow so you can rerun the flow or make edits.
Congratulations on completing the data shaping modules!
In these sections, we learned that consistent data labels keep our analyses from being fractured, consistent data types allows for correct aggregation, and tall vs. wide data structures make it much easier to compare data within a dimension. We used the Clean Step and Pivot Step in Tableau Prep to clean and shape our data in a save-able workflow.
At this point, feel free to connect Tableau Desktop to your cleaned and shaped data set and see if you can find any interesting trends.
But what if we want to supplement the targeted outreach data with data from our master BNL? Continue on to learn how to join data together - one of the most powerful concepts in data shaping.