A longstanding goal of government and society is to reduce criminal activity. To do so, we must first understand patterns in criminal activity -- where does it happen, and what factors, both geographical and socioeconomic, are associated with these places?
Crime forecasting is the process of predicting future crimes based on historical patterns of crime within a city or neighborhood. Crime forecasting is important because it arms local police departments with information about what types of crime of likely to occur in a given area, the frequency of these crimes, and which areas are more likely to be crime "hot spots" in the future. When related to data such as time of day, weather, likely area, etc. police can better plan crime prevention strategy, parole routes, and educational programs. Beyond improving policing, crime prediction can give communities and community leaders an understanding of the challenges they face and help them better implement policy that may prevent crimes from happening altogether.
Our project attempts to predict the category of crime committed in the Boston area based on location data, temporal data, and data about socioeconomic factors.
Intuitively, trends in crime should be effected by the temporal environment in which they took place. In fact seasonal trends in crime prevalence have been observed for centuries. According to the routine activation theory crime occurs more often during time periods when people spend more time doing activities outside [1]. More people outside increases the population of potential victims as well as potential perpetrators. Crime rates see an uptick during the summer months and are suppressed during the winter. Additionally crimes are elevated on weekends and holidays. Patterns are also evident in crime rates by hour of the day and day of the week. Crimes were elevated in the evenings, and certain crimes such as sexual assault followed distinctive temporal patters [2]. Given differences between types of crimes and the opportunities to commit them, we believe that temporal data will also be useful in distinguishing between different types of crime.
There have been many attempts in using static data in crime incident prediction. Specific types of crime have been predicted using historical crime data. Chung-Hsien Yu et al. has predicted burglary by dividing a city in to geographic blocks and counting the number of burglary crimes within each block for the past month. After running the data through various classification models, they determined that a neural network out performed other models in predicting crime hot spots [3].
Looking at temporal and spacial crime data from the cities of Los Angeles and Denver, Tahani Almanie et al. applied a Bayesian as well as a Decision Tree classifier to predict crime hot spots and differentiate between types of crime. After determining hot spots for crime, they related their model results with demographic data to profile high risk neighborhoods in Denver. They found that hot spot neighborhoods were associated with large populations, large number of housing units, many vacant houses, as well as a gender disparity favoring males. [4]
While a good starting point for our project, we aim to differentiate crime using historical data while incorporating temporal and location data in to our model.
Xiaofeng Wang et al. has related location data to demographic data in crime prediction. Again dividing a city in to blocks and accounting additionally for population, median value of houses, race, marriages as well as geographic data such as distance to the nearest road, school, and business. They performed step wise feature reduction to reduce the model to 11 predictive variables and predicted crime likelihood using a General Additive Model [5]. We hope to extend this analysis by using the various models we've learned in class to predict types of crime in Boston.
Though beyond the scope of our project, other approaches have focused on using "real time" or more dynamic data to predict the likelihood of crime in a given area.
The effect of population mobility has been explored in crime prediction. Looking specifically at property crimes, Carlos Caminha et al. found that greater rates of property crime occurred in areas of a city with higher flows of people through public transit. They argue that the floating population in an area at a given time, not the residential population, should be the relevant population of analysis when performing crime prediction [6].
Location based social media platforms offer further glimpses in to human mobility. A study by Shakila Khan Rumi et al. using data from Foursquare check ins extended the analysis to account for features such as visitor homogeneity, region popularity, types of visitors to a region, and how much a region is visited relative to its neighbors and found relationships between these dynamic factors and the type of crime committed [7].
For the scope of this project we focused on including additional static predictors that we could collate with the crime location data as well as expanding the time data into usable predictors.
For our extra data sets we wanted to expand the spatial data that we had available. Towards that end we added additional data from the United States census and the City of Boston Analyze Boston website.
Much work has been done to study the relationship between inequality and crime. Studies have found poverty level to have an effect on violent crime and income levels to have an effect on property related crimes [8]. Gini coefficient, a statistical measure of inequality, has been suggested to have a positive and negative correlation with certain types of crime rates depending on the unit of analysis, the populations being studied [9]. We will try to utilize measures of income disparity, poverty level, and Gini coefficient in our analysis to differentiate types of crime based on inequality.
[1] Andresen, M.A. & Malleson, N. Crime Sci (2015) 4: 12. https://doi.org/10.1186/s40163-015-0024-7
[2] Vania Ceccato & Adriaan Cornelis Uittenbogaard (2014) Space–Time Dynamics of Crime in Transport Nodes, Annals of the Association of American Geographers, 104:1, 131-150, DOI: 10.1080/00045608.2013.846150
[3] C. Yu, M. W. Ward, M. Morabito and W. Ding, "Crime Forecasting Using Data Mining Techniques," 2011 IEEE 11th International Conference on Data Mining Workshops, Vancouver, BC, 2011, pp. 779-786. doi: 10.1109/ICDMW.2011.56
[4] Almanie, Tahani, et al. “Crime Prediction Based on Crime Types and Using Spatial and Temporal Criminal Hotspots.” International Journal of Data Mining & Knowledge Management Process, vol. 5, no. 4, 2015, pp. 01–19., doi:10.5121/ijdkp.2015.5401.
[5] Wang, Xiaofeng, and Donald E. Brown. “The Spatio-Temporal Generalized Additive Model for Criminal Incidents.” Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics, 2 July 2011, doi:10.1109/isi.2011.5984048.
[6] Caminha, Carlos et al. “Human mobility in large cities as a proxy for crime.” PloS one vol. 12,2 e0171609. 3 Feb. 2017, doi:10.1371/journal.pone.0171609
[7] Rumi, S.K., Deng, K. & Salim, F.D. Crime event prediction with dynamic features. EPJ Data Sci. 7, 43 (2018) doi:10.1140/epjds/s13688-018-0171-7
[8] Kelly, Morgan. (2000). Inequality And Crime. The Review of Economics and Statistics. 82. 530-539. 10.1162/003465300559028.
[9] Jesse Brush, "Does income inequality lead to more crime? A comparison of cross-sectional and time-series analyses of United States counties," Economics Letters, Volume 96, Issue 2, 2007, Pages 264-268, ISSN 0165-1765, https://doi.org/10.1016/j.econlet.2007.01.012.