Introduction

Question: Who is responsible for what CO2 emissions?

1.1. Data selection

We’ll use EPA’s FLIGHT database, a list of large industrial/commercial facilities across the U.S. and their GHG emissions. Later, we’ll join this dataset with EPA TRI database, a list of facilities and their toxic releases, and the EIA power-plant data to get at electric generation numbers.

Flight data looks like this:

Looking at just the column headers, we get:

Now, there’s way too many rows here to work with it just using Excel. Let’s load the data into Tableau.

What is Tableau

Open Tableau and connect to a text file. Navigate to the GHGRP data.

1.2. Dimensions / measures and interface

Basic Tableau Interface:

A) This pane contains all the headers of your data, split into Dimensions and Measures. What are Dimensions and Measures? Typically, dimensions are your main ID variables and are almost always discrete data. Measures are usually measurements, often continuous data that are dependent on dimensions.

B) This pane contains the Columns and Rows shelf. Tableau is almost all drag-and-drop. You can drag dimensions and / or measures to these shelves to build a graph.

C) This is the pane where a graph will show up. If empty, you can drag a dimension / measure directly to this area.

D) This pane contains the Filters shelf and Mark control. Marks are Tableau’s name for data-points. You can modify the look of data-points by adding dimensions to “color” or “size” or just clicking on these buttons. Filters are an option to crop out or refine the data. Drag a dimension or measure to the Filters shelf in order to filter the data being graphed.

1.3. Create 1-D graph (aka, a histogram)

Question: What’s the distribution of facility CO2e emissions? Are there many large emitters vs small emitters? (We’ll need a histogram to answer this)

To start figuring this out, we likely need to do something with the “reported CO2e emissions” measure. Let’s drag that onto the Rows shelf and see what we get:

Because we haven’t differentiated by any dimensions, this just shows the sum of all facilities. Notice that, in the rows shelf, the “reported CO2e emissions” has a SUM before it. We can left click on that and change it to average or std deviation, but let’s change it to COUNT. This is almost a histogram, as it counts the number of facilities, but it doesn’t split it out or generate bins based on facility emission output.

There is an easy way to change this to a histogram: click on the “Show me” button in the top right of the screen.

The “Show me” lists all the basic graph types that Tableau can generate. Most are greyed out, because we don’t have enough Dimensions / Measures on the Columns / Rows shelf to generate anything interesting. It’s always worthwhile to click “show me” if you’re not sure how to graph stuff.

Let’s click on Histogram, the only thing not greyed out under “Show me”, this bell curve representing a histogram.

There’s a ton of small facilities and few large facilities, seems to follow a power law. So we’ve answered our initial question. Yay!

Note that Tableau created a new dimension “reported CO2e emissions (bin)”. This allowed Tableau to count the CO2e emissions reports within each bin size.

1.4. Create a group from existing dimension (select groups of states to form regions)

Question: Might the distribution be different for different regions in the U.S.?

Start a new worksheet through the menu bar at the top: worksheet -> new worksheet

Our data’s largest geographic Dimension is State, we do not have a region Dimension. So, we need to create one. The easiest way to do this is to drag State onto the middle of the graph area, “Drop field here”

This should cause a map to pop up, with a mark (circle) in the middle of each state. We can click and drag on the map to select groups of states.

This creates a legend on the right side of the screen, displaying two groups: the one we just made and all other states. Let’s change the label of the new group to improve readability. Right-click on the group in the legend, then click “Edit Alias”

Right click on the selected group of states and click the “group” option.

Now, group the rest of the states into their respective regions. Northwest, Southwest, Midwest, Northeast, Southeast. Ignore Alaska, Hawaii, and other US territories.

Now we have a good Region dimension to use on other graphs. And, we can answer our question: what’re the regional distributions of GHG emissions? Go back to our histogram, and add the new dimension to the Columns shelf.

1.5. Modify first histogram worksheet to generate histogram for each region.

Add the new regions dimension to the Columns shelf

Takeaway from graph: Northwest and “other” (AK, HI) have few facilities. Midwest has a ton of small facilities. South has many larger facilities.

Cool. So, We’ve seen some regional stuff, but there’s another question: What’s the average CO2e emissions per industry? Onto the next section.

SUBPAGES (10): 1. INTRODUCTION 2. CREATING 2-D GRAPHS 3. 3-D GRAPHS AND MORE! 4. MAPS 5. MERGE IN EIA POWER PLANT DATA 6. DOES THE AMOUNT OF ELECTRICITY GENERATED INFLUENCE GHG EMISSIONS? 7. CALCULATED FIELDS 7. CALCULATED FIELDS 8. TRI DATASET & TABLE CALCULATIONS 9. DASHBOARDS