Data Sets

CHICAGO

The city of Chicago provides a dataset that reflects reported incidents of crime that occurred in the city from 2001 to present minus the most recent seven days.

From the dataset, lots of valuable attributes regarding each crime incident are provided. Such attributes are date, primary type (crime type), district, community area, description, location description, latitude, and longitude.

The dataset is very well maintained and organized by the city of Chicago; the data does not contain any noise . However, the dataset has a few issues. The first issue is that some attributes such as ID and FBI Code are not necessary and useful for predicting crime patterns. The second issue is that there are missing values for x coordinate, y coordinate, latitude, longitude, and location. It can be intentional by the city of Chicago that the city does not want to disclose the locations of certain incidents or it can simply be a mistake that they forget to record in the dataset.

We come up with two new attributes:

  • day_of_week from date
  • time_section from date

MONTGOMERY COUNTY

The county provides the public with direct access to crime statistic databases. The dataset contains the information regarding all founded crimes reported after July 2013.

The important attributes of the dataset are class, class description, police district name, block address, city, zip code, place, start date/time, end date/time, and location. The attributes are given per each crime incident happened in Montgomery County. This dataset is also very clean and well maintained by Montgomery county; it does not contain any noise. However, it has a few minor issues.

The first issue is that there are some unnecessary attributes such as Beat, PRA, Incident ID, etc. that are not useful for predicting crime patterns.

The second issue is that there are missing values in zip code (unlike the city of Chicago), end date/time, and location. Some end dates/times are missing on purpose because Montgomery County police has not caught criminals yet or maybe simply forget to enter. Missing locations and zip codes can be intentional that the county does not want to disclose the locations of certain incidents or it can simply be a mistake that they forget to record in the dataset.

We come up with two new attributes:

  • day_of_week from start_date
  • time_section from start_date