Exploring Transit and Driving Behavior in MA, with Google Fusion Tables

Does the presence of good, abundant transit lead to a decline of miles driven in personal vehicles? To explore this question, I will use a few tools. The first one is "fusion tables:" an experimental web app from Google that allows you to take data from a source (like a spreadsheet) and turn it into a fully functional Google Map with some kind of data display layer. The tables shown below were generated by taking the output of Postgres/PostGIS and uploading it to Google's app. The data come from the "37 Billion Miles Data Challenge" hosted by MAPC, MassDOT, and MTC. Thanks go to all that were involved in putting it together.

Let's take a look at the contest dataset. The first fusion table is a combination of the walkshed shapes from the "Datathon Treat" and the MBTA's table of stop locations from their official "GTFS" files that represent the entire MBTA schedule in a computer-friendly format. The walksheds are colorized by route. Note that only the subway lines, commuter rail lines, and certain key bus stops are included in the data furnished by MAPC.

Feel free to click and interact with each map and explore. The various features each have some kind of information window that pops up when you click on them.

This next map is a visual depiction of the "mipday_phh" field, per 250 meter grid square. That is, "Miles Per Day Per Household," or the number of vehicle miles driven by a typical household within each colored square. I have color-coded the mileage into five percentile categories. The lowest quintile is colored with a light blue, the next is green, then yellow, orange, and finally red. You can play with this map just like the previous one. As you scroll around you might notice a certain pattern to the colors, they tend towards blue and green when near MBTA corridors, and more towards red when out in the suburbs. There is a clear difference in the amount of daily driving between the city and the countryside.

For further analysis I decided that I needed a model of "transit effectiveness" that might be correlated with travel behavior. The following map shows a square-by-square "T score" that evaluates, quantitatively, how "decent" is the service provided by the MBTA in the immediate area. The score is based on the frequency of nearby stops, divided by the distance to those stops, and adjusted by a "quality factor" based on mode. The darker green squares are considered to be the squares with the best access to transit. You can click on any of the squares to find out the rating behind it. The score is based on all available MBTA services, not just the key bus routes and subways, but of course it is scaled by frequency, so the minor bus routes will only have minor effect.

With this data at hand, I can ask the question: is good transit access correlated with less driving? (click graph for larger view)

MilesPerDayPerHouseHold vs T-Score

There does seem to be an inverse proportional correlation between this model of "T score" (X axis) and vehicle-miles per day (Y axis). Basically, the higher the "T score" the more likely the typical mileage-per-day falls on the lower end of the scale. There are a fair number of outliers but also keep in mind that there are over 22,000 data points on this chart and most of them are piled up in the corner.

The following fusion table shows the vehicle-mileage per day per household for the grid squares in the vicinity of the Worcester region transit authority. The colors are split up into quintiles, where cyan represents the lowest mileage and red the highest.

I have also applied a similar "scoring" analysis to WRTA data, with an "RTA score" map adjusted for the city of Worcester.

And a chart comparing Worcester "RTA score" (X axis) to vehicle-miles per day (Y axis).

The scatter plot has a similar shape to the previous one but is much foreshortened because Worcester's public transit system is not as frequent as the MBTA, and therefore scores much lower.

Springfield and the Pioneer Valley
The Pioneer Valley RTA serves this area, along the Connecticut River valley within Massachusetts. The following fusion table shows the vehicle-miles per day, per household data, for each grid square. The color codes are broken down into quintiles, as before.

 And here is a map depicting the relative "RTA score" of the grid cells within the general vicinity. Also, be sure to scroll around and view the other cities and towns in the Pioneer Valley region, including Northampton and Amherst, among others.

And finally, here is a scatter plot of "RTA score" (X axis) vs vehicle-miles per day (Y axis). Again, it looks similar to the previous two charts. PVRTA manages to score more highly in certain areas than WRTA, and as a result, there is more of a long tail to the chart, like with the MBTA.

I believe the graphs show that there is a fairly compelling, inverse proportional correlation between good, abundant transit and daily driving, and it is one that holds up in multiple regions of the Commonwealth. The examination of outliers can also be useful and interesting, as they are often caused by a strange situation found within the grid square. For example, there is one particularly high mileage grid square in downtown Boston that turns out to contain almost nothing but the Four Seasons Hotel, with its unusually high proportion of vehicles.

Unfortunately, although the correlation shown here between transit and driving is intriguing, there is no way to determine a causal relationship between these variables. In fact, I would hypothesize that both variables are strongly affected by a third variable, land use. Good transit and low vehicle-mileage are both features of diverse urban cores and those characteristics may come about independently of each other in such a place.

The "T score" formula is calculated for each route of interest as:
  • F is the number of trips per ordinary weekday,
  • D is the typical straight-line distance from points in the grid square to the station stop,
  • (a, b) are defined by the following table:
    • If mode is trolley or light rail, then (500, 1.0)
    • If mode is subway or metro, then (600, 1.0)
    • If mode is intercity or commuter rail, then (600, 1.0)
    • If mode is bus, then (500, 1.1)
These values came about mostly through trial and error by looking at various known locations throughout the city and using my own experience to decide what the relative rankings should be. This formula is only intended to be used as a means of quantifying the "feel" of transit for the purpose of this exercise.

When there were multiple stops for the same route within range, the algorithm chose the closest stop for each route. The "T score" for a grid cell is the sum of all the "T scores" for each route that has a walkable station stop within about 2 km of the grid cell.