Collecting, Analyzing And Interacting With Data

Byte 3 v1

Description: Your final product will be visualization of an animal adoption data set ([yourname]-byte3.appspot.com)
Due date: 2/4
Hand In: Fill out the peer grading form on blackboard.

Overview

In this project, you will create a visualization about the data from the Louisville Animal Metro Services. In order to do this, you will need to make use of the D3 visualization toolkit.

Using a Javascript visualization library (D3)
Passing data from your python code to javascript running in the user's browser
Designing a visualization of your data

Detailed Instructions for Byte 3

Byte 3 will build on the the data set we used in Byte 2. You should start by setting up a second google application, called [yourname]-byte3.appspot.com. You will need to repeat some of the steps in Byte2 to ensure that your application has access to your Fusion Table. We walk you through the use of Google Charts and D3 in this tutorial. However there are many other platforms you may want to investigate in the future. D3 has derivatives such as NVD3 and Vida; other options are HighCharts; and gRaphäel. You can see a comparison of the full set of options at socialcompare.com and this 2012 article from datamarket is also helpful.

Creating a Custom Visualization in Google Charts

For this first phase of this tutorial, we will start with one of the simplest options for visualization that is available, Google's Charts API

To use Google Charts, we will need to get data from the google fusion table using python code in 'main.py' and send it through jinja all the way to google charts (which is embedded in a webpage using the javascript language).

There are three important pieces to visualizing the data: Gathering the data from the fusion table, and setting up the plumbing to display it in a chart, and adding the chart itself to 'index.html'. In the interest of having something to show as soon as possible (which supports debugging) we will do this in reverse order

Adding a custom visualization to your web page

We will first create a chart in 'index.html' showing fake data. To do this, you can literally copy the javascript code found in google's chart api documentation into 'index.hml'. In particular, we will copy the code for a column chart in between the <head></head> portion of 'index.html'. In order to display the chart, we need to add <div id="chart_div"></div> somewhere in the body of 'index.html' as well. At the end we should have something like this:

Notice that the data for this chart is defined directly in the javascript we copied over, in the lines that say:

var data = google.visualization.arrayToDataTable([

    ['Year', 'Sales', 'Expenses'],

    ['2004',  1000,      400],

    ['2005',  1170,      460],

    ['2006',  660,       1120],

    ['2007',  1030,      540]

]);

In addition, the title and axis specifications are found in the javascript (and can be customized further):

var options = {

    title: 'Company Performance',

    hAxis: {title: 'Year', titleTextStyle: {color: 'red'}}

};

We will want to replace this with our own chart. Since we know something about what our data will look like, let's first create fake data that is more realistic:

data = [

  ['Age', 'Adopted', 'Euthanized'],

  ['< 6 months',  1000,      400],

  ['6-12 months',  1170,      460],

  ['12-5 years',  660,       1120],

  ['>5 years',  1030,      540]

You may find that the labels on the horizontal axis are cut off with this data. I updated my options as follows:

var options = {

title: 'Animal Outcomes based on Age at Arrival',

width: 400, height: 200,

chartArea: {height: '50%'},

hAxis: {title: 'Age', titleTextStyle: {color: 'red'}}

};

Setting up the plumbing for passing data from 'main.py' to the visualization

Our next goal is to move the fake data to python and successfully pass it to the java script we just added to 'index.html'.

1) Place data into a table in 'main.py.' As it turns out the data structure syntax is identical in python and javascript so we can literally copy the data = [... definition above into 'main.py'

2) Next we need to JSON encode the data (this will turn it into a simple string); and store it in the context to pass to jinja (which will pass it on to 'index.html').

Taking these two steps together, we get:

def get(self):

"""default web page (index.html)"""

data = [... # all the stuff above ]

context = {'data':json.encode(data)}

self.render_response('index.html', context)

3) Finally, we need to update the javascript in 'index.html' to retrieve the data. This simply requires us to write {{data|safe}} wherever we want to access the data. For example:

var data = google.visualization.arrayToDataTable({{data|safe}})

When you are passing information back and forth from your python code to jinja to java script for the visualization, it will be important to understand what information is available on both ends. You'll want to use the console for your browser to debug this (along with the 'console.log' function in javascript). In chrome, you access the console using an operating system specific key combination.

Debugging Hints

The flow of information in this code is multi-faceted. You are (hopefully) loading data from somewhere in Python, and packaging it up to send to javascript. Inside of javascript you may do further processing, and visualize the code, which creates DOM elements. Because of these complexities, you need to trace errors across several possible locations. If there is an error in your python code, it is most easily caught by looking at the Google Appspot log file, where you can print things out using the familiar logging.info(). Also, crashed code will show up in the same log if they come from your python code.

Assuming that your code doesn't crash somewhere in python, you may also need to debug on the javascript side. For this, you will want to use the javascript console, to which you can write (from within javascript scripts) using console.log(). Crashes in your javascript code will also show up in your console. As discussed in class, you can also inspect the DOM using the elements tab that shows up among the developer tools that include your console. You may have to go back and forth between debugging in your browser and in your python log files.

Using real data from fusion tables to show the relationship between age and outcome

Although we have now created a custom visualization, it only functions with the fake data we gave it. Our next step is to hook it up to the data in the fusion table from Byte2.

We can use the same code as from Byte2 to load the full data set directly from Google Fusion Tables. However, I have found that the speed of Fusion Tables can be variable to say the least. An alternative is to use the same mechanism as in 'explore.py' from Byte 2 to load the data from a file. You will need to download the data set into a file (such as 'data.json') using the code from explore.py because a google app engine application is not allowed to write to disk (it can write to a data store, but we will not be covering that in this class).

Once you have a file with data in it (you could just use the one from Byte 2), it needs to be placed into a static directory. We'll need to create a directory ('data/') inside [yourname]-byte3 and place 'data.json' in that directory. We'll also need to update app.yaml to tell google about the directory and make it application readable. NOTE: If you choose to do this, GOOGLE WILL CHARGE YOU A SMALL FEE FOR THE SPACE on an ongoing basis.

handlers:

- url: /favicon\.ico

  static_files: favicon.ico

  upload: favicon\.ico

- url: /data

  static_dir: data

  application_readable: true

- url: .*

  script: main.app

Next, we'll use python to collect the parts of the data we care about (without serial SQL queries). For example, to map ages to outcomes we need to initialize an array that contains an entry for each age something like this:

      age_by_outcome = []

        for age in ages:

            res = {'Age': age}

            for outcome in outcomes:

                res[outcome] = 0

            age_by_outcome = age_by_outcome + [res]

and then fill it with data:

# loop through each row

for row in rows:

  # get the age of the dog in that row

  age = age_mapping[row[ageid]]

  # get the outcome for the dog in that row

  outcome = row[outcomeid]

  # if the age is a known value (good data) find

  # out which of the items in our list it corresponds to

  if age in ages: age_position = ages.index(age)

  # otherwise we will store the data in the 'Other' age column

  else: age_position = ages.index('Other')

  # if the outcome is a bad value, we call it 'Other' as well

  if outcome not in outcomes: outcome = 'Other'

  # now get the current number of dogs with that outcome and age

  outcomes_for_age = age_by_outcome[age_position]

  # and increase it by one

  outcomes_for_age[outcome] = outcomes_for_age[outcome] + 1

Moving to D3

To use D3 (much more sophisticated than Google Charts), first download the latest version from the D3 website and unzip it into the [yourname]-byte3 directory. Next, be sure to update your 'app.yaml' file so that your application knows where to find d3. We'll also want to make use of CSS stylesheets when using d3, so we'll add a directory for stylesheets to 'app.yaml' as well. You should change the handlers section to look like this:

handlers:

- url: /favicon\.ico

  static_files: favicon.ico

  upload: favicon\.ico

- url: /data

  static_dir: data

  application_readable: true

- url: /d3

  static_dir: d3

- url: /stylesheets

  static_dir: stylesheets

- url: .*

  script: main.app

Scott Murray's D3 Fundamental's tutorial (or his free online book) will acquaint you with the basics of D3 (you may also find d3 tips and tricks useful). At a minimum, you'll want to produce a bar chart similar to the one we produced up above using Google's chart capabilities. However D3 can do so much more! This section of the tutorial will walk you through how I created the stacked bar chart at jmankoff-byte3.appspot.com (which I based on mbostock's example stacked bar chart).

First, I organized the data in main.py into a list of dictionaries, containing the number of animals in each outcome. Here is what the final output looks like in the log:

[{'Foster': 0, 'Returned to Owner': 0, 'Age': '<6mo', 'Adopted': 0, 'Euthanized': 0, 'Other': 0, 'Transferred to Rescue Group': 0}, {'Foster': 0, 'Returned to Owner': 0, 'Age': '6mo-1yr', 'Adopted': 0, 'Euthanized': 0, 'Other': 0, 'Transferred to Rescue Group': 0}, {'Foster': 0, 'Returned to Owner': 0, 'Age': '1yr-6yr', 'Adopted': 0, 'Euthanized': 0, 'Other': 0, 'Transferred to Rescue Group': 0}, {'Foster': 0, 'Returned to Owner': 0, 'Age': '>7yr', 'Adopted': 0, 'Euthanized': 0, 'Other': 0, 'Transferred to Rescue Group': 0}, {'Foster': 0, 'Returned to Owner': 0, 'Age': 'Unspecified', 'Adopted': 0, 'Euthanized': 0, 'Other': 0, 'Transferred to Rescue Group': 0}]

This is created using about 30 lines of code in 'main.py'. Here is the key section of that code. We are simply looping through all of the rows of data and collecting up information into the structure described above. The remainder of the code simply sets up the structure necessary for this to work.

   # loop through each row

        for row in rows:

            # get the age of the dog in that row

            age = age_mapping[row[ageid]]

            # get the outcome for the dog in that row

            outcome = row[outcomeid]

            # if the age is a known value (good data) find

            # out which of the items in our list it corresponds to

            if age in ages:

                age_position = ages.index(age)

            # otherwise we will store the data in the 'Other' age column

            else:

                age_position = ages.index('Other')

            # if the outcome is a bad value, we call it 'Other' as well

            if outcome not in outcomes: outcome = 'Other'

            # now get the current number of dogs with that outcome and age

            outcomes_for_age = age_by_outcome[age_position]

            # and increase it by one

            outcomes_for_age[outcome] = outcomes_for_age[outcome] + 1

Once we have done this we pass it to 'index.html' as context:

       # add it to the context being passed to jinja

        variables = {'data':json.encode(age_by_outcome),

                     'y_labels':outcomes,

                     'x_labels':ages}

        # and render the response

        self.render_response('index.html', variables)

Now we need to set up index.html. First we need to tell it about d3:

  <script type="text/javascript" src="d3/d3.v3.js"></script>

Next we start on the script for displaying the data. First we move the data into variables accessible to javascript:

  <script>

       // ----------- EVERY CHART NEEDS DATA --------------

       // this is the data we passed from main.py

       // the format for data is:

       // [{outcome1: amount1, ..., outcomen: amountn,

       // Age:'<6mo'}, ..., {outcome1: amount1, ... , Age: '>7yr'}]

       var data = {{data|safe}}

       // x_labels is an array of all the ages

       var x_labels = {{x_labels|safe}}

       // y_labels is an array of all the outcomes

       var y_labels = {{y_labels|safe}}

Now we can easily loop through the data to calculate information we will need later for graph creation. We want to create a graph that stacks rectangles for each outcome on top of each other. This means that only the first outcome is at position y=0, the remaining will be proportionally higher based on the amount of data in each previous outcome. Looping is done in javascript by saying:

       data.forEach(function(d) {

We calculate a y0 and y1 (bottom and top) position for each rectangle. We also calculate the total height of all of the stacked bars (from the bottom of the bottom bar (0) to the top of the top bar).

 // the y0 position (lowest position) for the first stacked bar will be 0

 var y0 = 0;

 // we'll store everything in a list of dictionaries, d.outcomes

 d.outcomes = y_labels.map(function(name) {

   // each outcome has a name, a y0 position (it's bottom),

       // and a y1 position (it's top).

   res = {name: name, y0: y0, y1: y0 + d[name]};

       // and we also have to update y0 for the next rectangle.

   y0 = y0 + d[name];

   return res;});

 // we also store the total height for this stacked bar

 d.total = d.outcomes[d.outcomes.length - 1].y1;

The next section of the d3 code, labeled

       // ----------- EVERY CHART NEEDS SOME SETUP --------------

sets up the axes and color scales. You should check the d3 documentation to understand more about what is going on here. For color picking, it can be helpful to use a site such as colorbrewer2.org.

The meat of any D3 visualization happens through DOM manipulation. D3 uses an SVG element for drawing, which in this case we place inside of the body of the HTML. In D3, a series of commands can be carried out as serial method calls, so for example we set up the svg using:

 // the svg element is for drawing. We set its size based

 // on the margins defined earlier

 var svg = d3.select("body").append("svg")

     .attr("width", width + margin.left + margin.right)

     .attr("height", height + margin.top + margin.bottom)

       // and add a group that is inside the margins

   .append("g")

     .attr("transform", "translate(" + margin.left + "," + margin.top + ")");

D3 also has a very unusual way of looping through data -- you simply reference it as yet another function call using something like .data(data). In the sample code we first create a group dom item for each bar by looping through the ages. Note that we select all '.Age' elements before we have created them (that happens in .append("g").

   // Create a group for each age

   var age = svg.selectAll(".Age")

      .data(data)

      .enter().append("g")

        .attr("class", "g")

        .attr("x_position", function (d) {return x(d.Age);})

        .attr("transform", function(d) {return "translate(" + x(d.Age) + ",0)"; });

Next we create a rectangle for each outcome. Again, we are selecting all rects before we actually append them to the visualization. This is non-intuitive but allows d3 code to be written without loops.

       // create a rectangle for each outcome (for each age)

       age.selectAll("rect")

            // bind the outcome data for that age to that rectangle

           .data(function(d) { return d.outcomes; })

         .enter().append("rect")

             .attr("width", x.rangeBand())

             // use the outcome data to determine y position and height

             .attr("y", function(d) { return y(d.y1); })

             .attr("height", function(d) { return y(d.y0) - y(d.y1); })

             // use the color scale to determine the fill color

             .attr("fill", function(d) { return color(d.name); })

At this point, you should be able to display a stacked bar chart in your browser generated using the code we just went through. However it is easy to add a little bit of interactivity. First, let's create a style sheet ('d3.css') which we reference in 'index.html' as:

  <link href="stylesheets/d3.css" rel="stylesheet" type="text/css">

Next we can make our bars respond to hovering:

rect {

        -moz-transition: all 0.3s;

        -o-transition: all 0.3s;

        -webkit-transition: all 0.3s;

        transition: all 0.3s;

rect:hover {

        fill: orange;

This is nice, but what if we want tooltips as well? A simple way to do this is to create a hidden div that we position and show based on mouse over events. To do this we need to add to our stylesheet:

#tooltip.hidden {

        display: none;

and add the div to our HTML inside the <body>:

   <div id="tooltip" class="hidden">

       <p><strong>Number of Animals:</strong></p>

       <p><span id="value">100</span></p>

    </div>

Finally, we need to add two more function calls to how we define our "rects":

   .on("mouseover", function(d) {

       //Get this bar's x/y values, then augment for the tooltip

       var xPosition = parseFloat(d3.select(this.parentNode).attr("x_position")) +

           x.rangeBand() / 2;

       var yPosition = parseFloat(d3.select(this).attr("y")) +   14;

       //Update the tooltip position and value

       d3.select("#tooltip")

           .style("left", xPosition + "px")

           .style("top", yPosition + "px")

           .select("#value")

           .text(d.y1-d.y0 + " animals were " + d.name + ".");

            //Show the tooltip (it's a div that is otherwise always hidden)

            d3.select("#tooltip").classed("hidden", false);

})

// and cause it to disappear when the mouse exits

.on("mouseout", function(d) {

            d3.select("#tooltip").classed("hidden", true)});

When you create your visualization, be sure to give it some sort of interactive aspect.

Hand In Expectations

You will be asked two things about your handin.

First, what did you do to improve the data itself? My sample code simply lumps missing values with 'other' outcomes, and labels missing ages as 'Other' as well. This is a very minimalistic (and even misleading in the case of Outcomes) way to treat missing data. An answer that only identifies a problem would be incomplete since that was already done in the previous assignment. If the answer discusses actually fixing errors, imputing missing values, & etc then it should recieve a higher score. The best answers would talk about the relationship between the choices that are made about data related problems and the visualization (and how those choices help the user to get the most from the visualization). Obviously if you choose to visualize some other aspect of the data, you'd address missing values or other errors with respect to that rather than Outcomes/Ages.

Second, what did you do to improve the visualization. A minimal change would be to improve the meaningfulness of the existing visualization. As it stands, the data you are displaying is hard to interpret because it's raw numbers. For example, it's hard to get a sense of the difference in percentage of animals with different outcomes because there were just fewer puppies than older animals that the shelter dealt with. This could be addressed in the visualization by changing the types of numbers used. You could also play with alternative visualizations (non bar charts); visualizations of other aspects of the data set; and other types of interactivity. For example, the D3 book we pointed you at walks through how to update the visualization when the user clicks on something like a radiobutton. You could allow the user to change the style or content of the visualization.

Going Further

You could also go deeper outside the parameters of the assignment. You could set your byte up to use user specific authentication so that you can store and retrieve user specific data (further documentation on authentication and the fusion API you may want to explore for this). You could also develop more interactive visualizations where the user can (for example) choose what two variables are being compared (stacked) in your bar chart. This would require sending data back from javascript to main.py or storing a much bigger array of data in your javascript code.

Comments

Page updated

Google Sites

Report abuse