Data Shaping with Tableau Prep

Less time cleaning, more time analyzing

There’s nothing more tedious than scrolling line by line through an excel file trying to fix errors or manually cutting and pasting data to transform your raw data into something usable. This activity will guide you through some foundational data structure concepts and show you how to build an automated data cleaning workflow in Tableau Prep!


It will take about 45-60 minutes total to complete the three data shaping modules and then however long you'd like for the Next Level modules. Use the nav bar or go down to the Skill Glossary to jump to a specific section.

Scenario - Targeted Outreach

A group of agencies have come together to kick off a targeted outreach program with a variety of services for their local youth populations. Most of the agencies are using a web-based data collection tool (e.g. Google Forms) that allow for consistent, geolocated data while one agency is using paper and pencil surveys. The data has been consolidated into a single spreadsheet.

The agencies are asking for a high-level report on what services are requested most frequently and whether there are any relationships between client demographics and requested services. The steps in this process will be to clean inconsistent data, shape the data to be “tall” as opposed to "wide", then finally visualize and analyze.


Exercise Materials

These exercises will primarily use the dataset compiled from the outreach staff and Tableau Prep. Though you can connect to the data directly through Tableau Prep, it may be easier to download the data as an excel file and connect to your local copy.

You can find the dataset here.

It will also be helpful to have Tableau Desktop available so you can see first-hand the impact shaping your data has on your analyses - and so you can dive into your clean data once you're done!

Ready to start shaping your data?

We'll start with examining our data and ensuring we have consistent data labels within each column.

Go to Consistent Data Labels

Skill Glossary

Consistent Data Labels

What does this mean?

Ensuring data that represents the same information is spelled the same across all instances.

What's the Impact?

Inconsistent labels leads to more fractured or confusing analyses.

Consistent Data Types

What does this mean?

Ensuring all data within a field or column is either all numerical, all date/time, or all text.

What's the Impact?

Mismatched data within a column prevents proper aggregation and analysis.

Tall vs. Wide Data

What does this mean?

Ensuring each column uniquely represents every variation of a single dimension instead of multiple columns representing variations of a single dimension.

What's the Impact?

Wide data makes it much harder to analyze data across a single dimension.

Next Level: Data Joins

What does this mean?

Super VLOOKUP - combining two different tables by a common field so the resulting table has fields from both original tables.

What's the Impact?

Having common fields to join across datasets greatly expands the universe of possible analyses and makes your data easier to manage.

Next Level: Data Unions

What does this mean?

Appending additional data to the bottom of an existing table with common column headers.

What's the Impact?

Using unions avoids having to repeatedly copy/paste data from similar tables into a "master" table.