Pivot Tables & OpenRefine

Example Data: Countries reporting on Scientific Progress to the United Nations

This data was compiled by students at Clemson on September 2016 in collaboration with AAAS to produce a report that visualizes how and when different countries who have signed the International Covenant on Economic, Social and Cultural Rights have reported on the Human Right to Science to the United Nations. Download the data to your computer.

Pivot Tables are available in Excel and Google Spreadsheets to summarize your data, which sometimes allows you to look for errors/inconsistencies in your data.

Pivot Tables: Summarizing Data by Country

In Excel:

Go to INSERT - PivotTable. The default will select the entire Table and will place the PivotTable in a new worksheet. Click OK.

Rename the new Sheet to Countries_excerpts

Choose fields to add to report: check Country, it gets automatically added to the Rows area.

Select Number of Excerpts. Drag it to the Values area. You will have the total number of excerpts each country has reported.

Select Type of Right. Notice how immediately gets added to the Rows area. You can see the number of excerpts by type and country.

Move Type of Right to the Columns area. Now you have each country with the number of excerpts for each type and the total number of excerpts.

In Google Spreadsheets:

Click on Data - Pivot table...

Rows: Add field - Country

Values: Add field - Number of excerpts by Sum and CountA. Notice the difference.

Values: Add field - Type of Right by CountA and CountUnique. Discuss the differences with your partner.

Pivot Tables: Summarizing Data by Year

Go back to the data sheet (Intro GIS scientific progress). Click INSERT - PivotTable to create a new one. Leave the defaults and click OK.

Rename the new Sheet: Year_Countries.

Add Year to the Rows area (notice the 360 mistake).

Add Country to Values. This will give you the total number of countries reporting on this right by year.

From the Fields area, drag Country to the Columns field.

Now you have the number of excerpts that each country reported by year.

Review: Basic operations in Refine

Launch OpenRefine. This should automatically open your web browser to the correct address: 127.0.0.1:3333.

In OpenRefine, on the right-hand side, select Create Project - Get data from This Computer and click the Choose Files button. Locate the Scientific progress data (download it here) and click Open.

Press the Next button to preview your data.

In the screen that appears, make sure to select Columns are separated by commas (CSV) and uncheck Store blank rows (bottom right corner of your screen).

If your data looks correct, press the Create Project button on the top right corner.

Let's get familiar with common operations that were covered earlier. For example, change the number of records displayed by selecting 50 records on the Show menu.

Rename the Text column to Excerpt_content by selecting Edit column > Rename this column. Under Enter new column name, type: Excerpt_content. Click OK.

Exploring and Transforming your data using Facets

Tale a look at the Country column. This column contains information about each country reporting to the UN. There should be only 164 values. Notice the variation. To know exactly how many different values we have, go to this column and select Facet > Text Facet.

You should see 170 different choices for the results. Scroll to the bottom and notice the problem with Vietnam. Next to each choice you have the number of records in your table with that choice.

To manually change the choices, click on Viet Nam. and edit. In the box that appears, change it to Vietnam. Click Apply. Notice how you have only 169 choices now.

We are going to cluster the data to combine the choices in a faster, coherent way.

In the same column, select Edit cells > Cluster and edit...

Explore different methods. Click Merge and edit the new cell value for all the countries with the same name and press Merge Selected & Re-Cluster.

Editing Column Text with ReplaceChars

replaceChars(string s, string f, string r)

Returns the string obtained by replacing any character in s that is also in f with the character r. For example, replaceChars("commas , and semicolons ; are separators", ",;", "**") returns the string commas ** and semicolons ** are separators. (Tip: you can also do this by importing the data with character encoding).

Splitting a column

Notice there is a 3 number, 30 records that are blank, and problems with f3 and f6.

Fix them in the facet column.We want to split the type of right so we can then summarize the overall types that we have reported.

First, click on Facet - Text Facet to see if you have consistency in your records.

In the Type of Right column, select Edit column - Split into several columns.

In the window that appears, select the following settings:

Split Column by field lengths - type 1,3

Uncheck Remove this column

Concatenating two columns

Let's say we want to concatenate country and year.

In the new column, click on Edit column - Add column based on this column

Other way (by Nick Mills): cells['Year'].value.toString().replace(/\..*/,"")+ " " + value