Session 2: WaPo Distribution Data

2.1 Understanding the data

Open the WaPo transaction dataset within Tableau. The file has a .hyper extension - this is a Tableau specific data file. A CSV version of this file is about 1.2 GB.

There are over 3 million records of transaction data.

We need filter a lot of columns out of this dataset to better work with it. We'll filter out:

  1. Reporter Addl co info

  2. Address1

  3. Address2

  4. Buyer Addl Co Info

  5. Transaction Code: All Sales

  6. Drug Code

  7. NDC number

  8. Unit

  9. Action Indicator

  10. Order Form

  11. Correction

  12. Product Name

  13. Ingredient Name

  14. MME Conversion Factor

  15. Combined Labeler Name

  16. Revised Company Name

Then, when we refresh the data, we'll likely need to save an updated version of the data extract. This extract is effectively Tableau's local data file. Typically, you'll want to use extracts because they're faster for Tableau to process (but, extracts take longer to update when new data comes in.)

2.2 Data Exploration

Let's try to understand which drugs exist within this dataset. Drag "Drug Name" to the Rows Shelf. Your screen should show this: Two drugs, Hydrocodone and Oxycodone.

How many drugs were purchased? Add "quantity" to the Columns shelf. Additionally, switch Tableau's charting bounds to "entire view", so that the whole screen is used to display the graph. Here is the result.

So, Hydrocodone was purchased about three times more than Oxycodone. Which stores ordered them? Add "Buyer Name" to the Rows shelf. There are a lot of buyers here... can we sort the list better? Sort by quantity. Apparently the VA is huge, because this is one of two country-wide VA distributors. Let's go ahead and exclude the VA. Right click and select "exclude".

Scrolling through the list, there's a lot of "BI LO" and other individual stores. Would the sorted list change if we grouped these together? Let's try grouping some things - right-click on "Buyer Name" under the Dimensions library, then go to "Create -> Groups." Use this menu to group many of the big chains together. For today, we'll only group together:

    1. BI LO

    2. Ingles

    3. Kroger

    4. Piggly Wiggly

    5. Publix

    6. Walmart

Tableau has created a new dimension for us to use. Let's grab the "Buyer Name (Group)" dimension and drag it to replace the original "Buyer Name" dimension on the Rows shelf. Now, we see some more familiar names at the top of the list.

Let's rename this sheet Who are the big buyers?

2.3 Data Mapping

Where are the drugs coming from? Let's look through our dimensions and see what geospatial ones we have. It looks like the highest level-of-detail geospatial dimension is Zip Code, so we'll use that to map distributors. Let's color the map by "Distributor or Manufacturer?" and size each dot by "Quantity." Here is the result.