Session 2: WaPo Distribution Data
2.1 Understanding the data
Open the WaPo transaction dataset within Tableau. The file has a .hyper extension - this is a Tableau specific data file. A CSV version of this file is about 1.2 GB.
There are over 3 million records of transaction data.
We need filter a lot of columns out of this dataset to better work with it. We'll filter out:
Reporter Addl co info
Address1
Address2
Buyer Addl Co Info
Transaction Code: All Sales
Drug Code
NDC number
Unit
Action Indicator
Order Form
Correction
Product Name
Ingredient Name
MME Conversion Factor
Combined Labeler Name
Revised Company Name
Then, when we refresh the data, we'll likely need to save an updated version of the data extract. This extract is effectively Tableau's local data file. Typically, you'll want to use extracts because they're faster for Tableau to process (but, extracts take longer to update when new data comes in.)
2.2 Data Exploration
Let's try to understand which drugs exist within this dataset. Drag "Drug Name" to the Rows Shelf. Your screen should show this: Two drugs, Hydrocodone and Oxycodone.
How many drugs were purchased? Add "quantity" to the Columns shelf. Additionally, switch Tableau's charting bounds to "entire view", so that the whole screen is used to display the graph. Here is the result.
So, Hydrocodone was purchased about three times more than Oxycodone. Which stores ordered them? Add "Buyer Name" to the Rows shelf. There are a lot of buyers here... can we sort the list better? Sort by quantity. Apparently the VA is huge, because this is one of two country-wide VA distributors. Let's go ahead and exclude the VA. Right click and select "exclude".
Scrolling through the list, there's a lot of "BI LO" and other individual stores. Would the sorted list change if we grouped these together? Let's try grouping some things - right-click on "Buyer Name" under the Dimensions library, then go to "Create -> Groups." Use this menu to group many of the big chains together. For today, we'll only group together:
BI LO
Ingles
Kroger
Piggly Wiggly
Publix
Walmart
Tableau has created a new dimension for us to use. Let's grab the "Buyer Name (Group)" dimension and drag it to replace the original "Buyer Name" dimension on the Rows shelf. Now, we see some more familiar names at the top of the list.
Let's rename this sheet Who are the big buyers?
2.3 Data Mapping
Where are the drugs coming from? Let's look through our dimensions and see what geospatial ones we have. It looks like the highest level-of-detail geospatial dimension is Zip Code, so we'll use that to map distributors. Let's color the map by "Distributor or Manufacturer?" and size each dot by "Quantity." Here is the result.