7. OpenRefine and Tableau

In this exercise we will combine the power of OpenRefine for formatting data with the capabilities of Tableau for visualizing time series.

The data that we will be using is:

- Persons obtaining lawful permanent resident status by state or territory of residence, 2005 to 2014. Source: US Department of Homeland Security, Yearbook of Immigration Statistics.

The goal is to visualize the number of legal permanent residents accepted by each State from 2005 to 2014.

Download the data. As you can see, it is quite clean and ready for analysis. The problem is that for the chart that we want to create, the data is in wide format instead of long format.

We will use OpenRefine to easily clean the data and convert it to long format.

7.1 Cleaning and Formatting Data with OpenRefine

Import the excel source table (2014_table4.xls) in OpenRefine. (Note: If your are not familiar with OpenRefine, please sign up for our workshop on cleaning data with OpenRefine).

Cleaning Data

In the Preview panel, make sure you select the following options:

- Ignore first 3 lines at beginning of file

- Parse next 1 line as column headers

- Discard 1 row of initial data (gets rid of the Totals row)

- Load at most 54 rows of data (gets rid of all the rows after Wyoming)

- Store blank cells as nulls

Uncheck store blank rows

Click the Update Preview and make sure your data looks clean. Click Create Project

One your preview you pretty much cleaned the Excel dataset without having to use any special operation.

Transposing Data

Observe the data values. For each state, we have the values for each year.

7.2 Visualizing Time Series with Tableau

Open Tableau and import the new table: 2014_table4-xls.csv. Notice that both the Year and Permanent Residents fields are floating with several decimal points.

To change this, go to the top left corner of each field, press the drop-drown arrow and select Create Calculated Field. In the calculation, type: INT([Year]) to create a new field Year_Int.

Repeat the same process with Permanent Residents (Permanent_Residents).

Now your data is ready for visualization with Tableau. Open a new worksheet.

Drag Year to the Columns card.

Add Permanent_Residents to the Rows card.

In the Show Me card, select Area charts.

Add State to the Color cards.

In the pop-up window that appears, select Add all members.

To sort the chart so the states with the higher number of permanent residents is at the bottom of the chart, click on the legend card for State on the drop-down arrow on the top right corner and click Sort... Select Field: Permanent_Residents Aggregation Sum. Click OK.

We also want to fix the labeling for the years (without decimal). Right-click on the Year X axis and select Edit axis. In the window that appears, select the Tick Marks tab. Choose Fixed every 1 unit. Click OK.

Your graphic should look like this:

For Tableau and other programs such as R and SPSS, in order to analyze and visualize the time series, the data needs to be in long format:

State - Year - Value.

To do this automatically in OpenRefine, click on the column with the first year (in this case, 2005) on the drop-down arrow to the left of the column name, and select Transpose > Transpose cells across columns into rows...

In the window that appears follow the settings below:

- From Column - 2005

- To Column - 2014

Transpose into Two new columns:

Key Column - Year.

Value Column: PermanentResidents

Check Ignore blank cells and Fill down in other columns

Click Transpose. See the results.

The table now is in a long format. The only thing to do now is to transform your Year column to number (drop-down arrow to the left of Year, click on Edit cells > Common transforms > To number.

Your table is ready to export.

Click on Export - Comma Separated Value.

Your table should look exactly the same as this table.

The table is now ready to be imported into Tableau for a time series visualization.