Portfolio‎ > ‎

City of Chicago Crime and CTA Data in CAVE2


The goal of this project was to visualize large-scale crime data from the city of Chicago in a single view on a large display.   To complete this project, I mapped crime data and crime statistics from the city of chicago onto a large 2d map shown using osgearth and omegaLib in CAVE2.  Omegalib is a framework for virtual reality and visualization in cluster-based systems.  Much of the complexity of synchronizing nodes in a cluster is handled behind the scenes, allowing computer scientists and artists to develop for CAVE2 with relative ease.  The python interface to CAV2 is particularly easy to use. 

I will describe the process of building and running me application, the resulting visualization and the challenges in the process of producing this visualization.  I will discuss some of the technical implementation details as well.  

Demo in CAVE2

Building and running the application:
Get source code and data here

Instructions on how to build omegalib and osgearth can be found here: 
Once built, you can run my code using the orun script:  orun -s chicagoflat.py .  

This application cannot be run on old graphcis hardware, since the shaders are designed for the CAVE.   For instance, my 3-year-old MacBook Pro does not run GLSL 1.5, so it cannot run this program.  

Dependencies:

Follow the build advice on the wiki.  Dependencies include:
python
omegalib
osgearth

For Omegelib to run you'll need OmegaLib, Python, osgearth installed.  

The Data:

The data presented in this application comes from the City of Chicago public data portal.  The data sets include:
- Crime data from 2001 to 2013, consisting of date, time, latitude, longitude, neighborhood, and crime type.  Posted crime types range from assault to disturbing the peace to motor vehicle theft.   
- Coordinates from the CTA train lines, that specify the path of the train lines
- Coordinates of Chicago neighborhood boundaries, such as Lincoln Park and the South Loop. 
- Streaming CTA train data
- Simulated real-time crime data, taken from 2012 but treated as real-time data input




Data Processing:

Significant data parsing was required to reduce the data size so it could be loaded efficiently into my program. There were over 5 million crimes listed between 2001 and the present.  To handle this volume, I wrote the data to a binary file and used GLSL shaders to visualize the data in parallel, taking advantage of CAVE2's cluster of 36 GPUs, described further under 'shaders'.
Data parsing was also needed to take kml files and get coordinates to load CTA stations, CTA train line segments and streaming train positional data.




Crime Data Points and Shaders:

To load and visualize the 5 million crimes in parallel, I followed the examples in the omegaLib point cloud module, as well as the code for the Endurance project written by Alessandro Febretti. This approach involves loading the crime data as a series of models, where each individual crime is encoded as a vertex with position and associated data.  In the shaders, spheres are drawn for each vertex when the crime data point is in range, evaluated based on the crime data and the Uniform values passed in from the main program.  For instance, I set a Uniform value indicating whether crimes from 2001 are to be drawn, passed this value to the shader, and only rendered crimes from that year if that value was set to be true. The decision to render a sphere for a crime data point was performed in the geometry shader.  

This approach worked remarkably well, permitting me to quickly load, visualize and filter all 5 million crime data points.  This would not be possible without parallel processing.

Selecting crime data points can take place in the 'view by' menu.  



Statistical Data Visualization:

To compare crime patterns in different city neighborhoods, I used the identified 'community Id', along with community border kml data files, to show bar charts of crime volume in each neighborhoods in different periods of time.  For instance, it is possible to view volume of motor vehicle theft on Saturdays during the summer of 2003, and then toggle to view Fridays during the summer 2003, to see if different days of the week have different volumes of a particular crime.  You could then toggle on/off other years to see if 2003 was unusual.  

Limitations:  If I were to continue working on this project, some means of direct, side-by-side comparison between statistical values in a neighborhood would be useful.  In addition, I would post the firm counts of these crimes in a given period, something which is not possible to ascertain in the current version of the program.  

Producing the statistical view was a challenge.  I opted for a series of flat files, which I converted into a tree of counts for different filters.  However, traversing the tree is often slow and loading the large statistical data files slowed the launch of my program.  An sql database might help address this problem.  

Additional limitations include conflicts of choices where the user selects a statistical view, but runs a filter that only applies to individual crime points.  I did not adequately check for all these cases.  

Selecting statistical data can be done in the 'view by menu.  




Data Filters:

The 'filter' menu permits the user to select crime types, years, seasons, times of day and days of the week to toggle on/off in both the display of statistical data and individual crime points.  

Limitations:  Some filter values appear to fail.  For instance, Spring de-selection does not approach to change the height of statistical bars.  Perhaps there is an error in earlier data parsing that produced this problem.  






Real Time Data Display:

The 'real time' menu option activates reading from the CTA train tracker sever and the presentation of simulated real time data, taken from 2012.  Unfortunately, in many cases there are few crimes recorded in a given instance, so I have opted to show all the crimes from the past hour, with new crimes popping onto the scene as they are recorded.  

Limitations:  While train positions update, I have not implemented this in a way that would permit program run indefinitely.  For instance, I do not delete trains that are out of service once they reach the end of the line.  I also in future versions would improve the look of the train cars, to include direction and heading for ease of interpretation of this data.

Animation:

It is possible to animate through the individual crime points, by selecting 'start animation' in the main menu.  The rate of animation is weekly.  Future versions of the program ought to permit the user to vary the rate and start/end time of the animation, to permit users to watch crime unfold in time.  Nevertheless, this fast animation over 5 million crimes could only achieved with parallalism.

Navigation:

I have included options to 'jump' to a neighborhood, using the InterpolActor from the omegalib util module.  Neighborhood positions are computed from the neighborhood shape files.  However, for this feature to work more effectively, I need to control camera orientation and stop the camera further from the target location.  Therefore, further work is needed to improve this feature.

  

Music:

Music plays on program launch.  I found the music on Free Music Archive:  Kai Engl's Moonlight Reprise.  I converted it to .wav using Audacity.  One problem with my music:  it does not loop.  I needed to insert a 'play' command in the update function, once the music stopped.   


Findings:

It is not surprising that lots of crime occurs in Chicago.  However, the rate and distribution of crime in the city does not appear to change significantly.  Crime is near constant, over all seasons and all days of the week and times of the day in many neighborhoods.  There is need to better visualize trends, perhaps by showing how above or below average a given period is.  Alternatively, comparative views are needed to effectively see differences in crime rates.  

Few crimes are recorded in 2001, likely because the program to digitally record crime data was in its infancy.  

Locations of crime are always noted on streets, suggesting that GPS coordinates are obtained from police cars.  Parks appear to have no crime in them.  





Comments