The San Francisco Department of Public Works, founded in 1900, is responsible for the care and maintenance of San Francisco's streets and infrastructure. "The staff designs and manages construction of civic buildings and streets, cleans buildings and streets, cleans and greens the right of way, maintains civic buildings; trains people for jobs, keeps the right of way free of hazards, paves the streets, repairs bridges and public stairways, expands accessibility and works at the forefront addressing some of San Francisco’s biggest challenges, including homelessness" [1]. One of the projects founded and maintained by the Public Works Department is the Pit Stop program. This program provides clean and safe public toilets, used-needle receptacles, and dog waste stations. The first three sites were built in 2014 in San Francisco's Tenderloin neighborhood, and has since expanded to 24 sites in 13 neighborhoods. The Pit Stop locations were chosen based on the high volume of requests for Public Works to steam clean the sidewalks in order to remove human waste. These locations also happen to be in the "most impacted" neighborhoods in San Francisco [2]. In the first year of the program, the number of requests for Public Works' steam cleaning services related to human waste on sidewalks decreased by 60% [3]. There has obviously been an improvement in street cleanliness because of the Pit Stop program, but has there been an improvement in crime as well?
Do Public Works projects (specifically, the Pit Stop program) in San Francisco make an impact on crime? If so, in which neighborhoods?
Is there a correlation between Public Works projects and crime in San Francisco?
Can crime be predicted based on the placement of Public Works projects?
1. Police Department Incident Reports: Historical 2003 to May 2018: Provided by the City and County of San Francisco. Dataset includes an incident number, description of incident, date, location, etc. There are 2,160,953 rows and 14 columns. 253 MB
2. Police Department Incident Reports: 2018 to Present: Provided by the San Francisco Police Department. Dataset includes an incident number, description of incident, date, location, etc. There are 385,394 rows and 26 columns. 127 MB
3. San Francisco Pit Stops: Provided by San Francisco Public Works. Dataset includes name, address, neighborhood, location, current police districts, year built, etc. There are 26 rows and 11 columns.
4. San Francisco County Census 2010-2019: Provided by the United States Census Bureau. Dataset includes total population, number of males, number of females, age groups, ethnicity, etc. There are 10 rows and 27 columns.
4. San Francisco Neighborhood Census 2010-2019: Provided by the San Francisco Planning Department. Dataset includes total population, percentage of females, household information, ethnicity information, age distributions, educational attainment, employment information, etc. about residents in each neighborhood in San Francisco. There are 63 rows and 80 columns.
5. San Francisco Tourism 2008-2019: Provided by the San Francisco Travel Association. Dataset includes number of tourists per year and spending amount by tourists per year. There are 12 rows and 3 columns.
The number of crimes in a neighborhood will decrease after Pit Stop locations are placed there.
Load datasets, clean datasets, and perform exploratory data analyses on them.
Create a map that displays where each Pit Stop location is in San Francisco.
Create static and dynamic maps of crime prevalence in San Francisco neighborhoods for each year that data is provided (2003-present).
Check if there is a correlation between Pit Stop location and crime.
Use machine learning algorithms (i.e., random forest, decision trees, LSTM, Prophet, etc.) to predict crime in areas where Pit Stops are available.
Papers have been written about the efficacy of public works projects, but not many about those projects' effects on crime. None of the papers about evaluating the effectiveness of public work projects used machine learning algorithms. Mackintosh and Blomquist stated that "[t]he available evidence suggests that well-designed public work programs can be successful both in targeting benefits to the poor and in furthering social welfare goals" [4, p. 2]. Chalfin et al. [5] cited previous research suggesting that social and physical features of an urban landscape could impact crime. Continuing this line of research, Chalfin et al. evaluated "whether and how violence can be successfully reduced by environmental design changes" [5, p. 3] by looking at the effect of street lighting on crime. They found that street lighting reduced the amount of outdoor, nighttime index crimes by about 36% in New York City. This study provides evidence that some government-funded projects have an impact on reducing crimes.
Most of the relevant research regarding my project topics have been done on machine learning techniques to analyze and predict crime. Wheeler & Steenbeek [6] used a Random Forest model to accurately predict the occurrence of crime in micro areas (grid cells of 200 by 200 feet) in Dallas, Texas. They also dove into the "black box" of machine learning algorithms to understand why a certain place is predicted to be a crime hot spot rather than simply where crime will happen. They accomplished this by using interpretable statistical methods. Alves, Ribeiro, and Rodrigues [7] also used Random Forests to accurately predict (97% accuracy) homicides in Brazilian cities and to determine the urban indicator variables that had the most importance to the prediction, which were unemployment and illiteracy. "The accurate predictions obtained through statistical learning suggest that crime is quite dependent on urban indicators" [7, p. 440]. Bogomolov et al. [8] distinguished themselves from many previous researchers by using behavioral data captured by cellphones and basic demographic information to predict crime rather than background historical knowledge or offenders' profiling. The researchers trained several algorithms, such as logistic regression, support vector machines, neural networks, and decision trees. The algorithm that gave the highest accuracy was the Random Forest algorithm. They were able to obtain 69% accuracy in predicting whether a specific area in London would be a crime hotspot or not [8]. Iqbal et al. [9] found that a Decision Tree algorithm performed better on multiple performance measurements, including precision, recall, accuracy, and F-Measure, than a NaĂŻve Bayesian algorithm. They were able to predict crime categories for different U.S. states [9].
It is evident that research has been performed about the efficacy of public works projects and about crime prediction using machine learning techniques. My work is different from these projects because it is combining the two topics. The goal of my project is to evaluate if Public Works projects in San Francisco, the Pit Stop program in particular, have an effect on crime, and, if so, in which direction.
Most Pit Stop locations are located in a cluster on the eastern side of the city.
Fig 4. San Francisco Pit Stop Locations by Year Built (map below)
Figure name above is a clickable link to an html page that has hover-over information about each Pit Stop location.Most of the Pit Stops were built between 2014 and 2017.
Incident Dataset EDA
This graph only contains data from 2018-Present as the Police Department Incident Report 2003-2018 dataset did not include neighborhood information.
Fig 6. Top 15 Crime Categories
The most common crime is larceny/theft.
Fig 7. Number of Crimes per Year
The highest number of crimes occurred in 2014.
Most Pit Stop locations are in the Tenderloin and Mission neighborhoods.
Tenderloin and Mission are two of the neighborhoods where a large number of crimes have occurred from 2003-Present.
I found a more detailed dataset with census information broken down by neighborhood in San Francisco. The "Datasets" section above reflects these changes.
The following graphs include census information about each of the neighborhoods that have at least one pit stop. Most of the graphs include a neighborhood that I called "All." That is simply the city of San Francisco as a whole. It is included in order to compare each neighborhood that has a pit stop with San Francisco as a whole.
From the above graphs, we can see that some neighborhoods have more affluent and more educated residents, home values are higher, the percentage of residents in poverty are lower, etc. We can place neighborhoods into, albeit simplified, groups based on the above characteristics.
These graphs only contain incident information about neighborhoods that have at least one pit stop. The incident categories in these graphs are the 14 most common incident categories. For most graphs (except for Number of Total Incidents and Number of Larceny/Theft Offenses), the y-axis values are the same so that comparisons between graphs can be made.
We can see that the highest numbers of most of the incident categories occur in Group 2 neighborhoods (Mission, Tenderloin, and South of Market).
Pit stops were placed in Group B neighborhoods first.
There are higher numbers of pit stops in Group B neighborhoods.
In the map to the left, the colors of the dots correspond to the neighborhood while the size of the dots corresponds to the number of total incidents that occurred in that neighborhood each year. The bigger the dot, the more incidents that occurred.
In the map to the left, the colors of the dots correspond to the number of total incidents that occurred in each neighborhood each year. The darker and redder the color gets, the more incidents that occurred there. The size of the dot corresponds to the number of pit stops that are in the neighborhood each year. The bigger the dots, the more pit stops the neighborhood has.
Mann-Kendall Test [10]: used to analyze data collected over time for consistently increasing or decreasing trends
Pearson Correlation Coefficient Test [10]: measures the strength of a linear association between two variables
Before testing, data was separated so that neighborhoods were only compared against themselves.
Mann-Kendall [11]: There is a statistically significant increasing trend in incidents in all Pit Stop Neighborhoods.
Pearson Correlation Coefficient: Each of the most common incident categories was tested against number of pit stops to see if there is a correlation. Most of the incident categories were positively correlated with number of pit stops. Highlighted in gray are the neighborhoods where there was at least one statistically significant negative correlation between an incident category and number of pit stops. The incident category that was negatively correlated with pit stops is in parentheses.
Below are scatter plots of the statistically significant negative correlations between an incident category and number of pit stops, separated by neighborhood.
Golden Gate Park
Haight Ashbury
Sunset/Parkside
In Phase 3, I tested each of the 47 incident categories against number of pit stops in each Pit Stop Neighborhood. Below, I have heatmaps showing the strengths of the correlation coefficients. I have separated the heatmaps by Group A and Group B neighborhoods.
Golden Gate Park
Sunset/Parkside
Financial District/South Beach and North Beach did not have any statistically significant negative correlations between incident categories and pit stops.
Summary of Correlation Coefficient Test Results
8 out of 10 pit stop neighborhoods had at least 1 statistically significant negative correlation between incident categories and pit stops.
6 out of 10 pit stop neighborhoods had more than 1 statistically significant negative correlation between incident categories and pit stops.
16 out of 56 (29%) incident categories had a statistically significant negative correlation with pit stops.
Number of Pit Stops helped with the prediction of neighborhood for the Pit Stop Neighborhoods sub-dataset (see ML results below).
Machine Learning Methodology
Split dataset into two sub-datasets:
Pit Stop Neighborhoods sub-dataset (includes all neighborhoods with pit stops)
Non-Pit Stop Neighborhoods sub-dataset (includes all neighborhoods without pit stops)
Use Label Encoder on all categorical variables.
Split datasets into train/test sets.
Train each dataset on two different algorithms:
Decision Tree Classifier
Random Forest Classifier
Assess results with accuracy scores, feature importance reports, and confusion matrices.
Decision Tree Classifier
Plotly Maps of Neighborhood and Incident Category Predictions from Decision Tree Classifiers
Each of the dots on the following map indicate the neighborhood prediction by the Decision Tree Classifier. If you hover over any of the dots, you will see the neighborhood prediction, the correct neighborhood, the incident category prediction, and the actual incident category. There is a plotly scatter mapbox for Pit Stop Neighborhoods and Non-Pit Stop Neighborhoods.
Random Forest Classifier
Which model did best?
Both classifiers did well predicting neighborhood for both datasets and predicting incident category for the non-Pit Stop Neighborhood sub-dataset. Both classifiers struggled to predict incident category for the Pit Stop Neighborhood sub-dataset. The Decision Tree Classifier did slightly better predicting incident category for the Pit Stop Neighborhood sub-dataset (61.3% vs 54%). Therefore, the Decision Tree Classifier is the best model for this data.
Why did both models not do well in predicting incident category for the Pit Stop Neighborhood sub-dataset?
Neither of the classifiers did well predicting incident category for the Pit Stop Neighborhood sub-dataset. In the course of trying to figure out why this was, I discovered that since I had sorted and ordered years before I split the dataset into train and test sets, some incident categories were only in the train set and some incident categories were only in the test set. There were 38 incident categories in the train set and 47 incident categories in the test set of the Pit Stop Neighborhood sub-dataset. When I split the dataset into train and test sets without sorting year, the accuracy of the Decision Tree Classifier prediction of incident category increased to 99.9% and the accuracy of the Random Forest Classifier increased to 96.4% with only 50 trees used in the model.
The hypothesis is supported by the following:
What could be done differently?
Run the models on the entire dataset instead of splitting it up by neighborhood.
This may have helped with predicting incident category
Use techniques to avoid overfitting
(i.e., remove features, cross-validation)
What could be done next?
Use LSTMs or other deep learning algorithms to see if they increase the accuracy of the model.
[1] San Francisco Public Works, "About us," sfpublicworks.org. [Online]. Available: http://sfpublicworks.org/about.
[2] San Francisco Public Works, "San Francisco pit stop: Where neighborhood challenges meet civic pride." [Online]. Available: https://sfpublicworks.wixsite.com/pitstop.
[3] Hoodline, "Two months in, pit stop program in SoMa is going strong," hoodline.com, June 22, 2015. [Online]. Available: https://hoodline.com/2015/06/two-months-in-pit-stop-pilot-program-in-soma-is-going-strong.
[4] F. Mackintosh, J. Blomquist, "Systemic shocks and social protection: The role and effectiveness of public works programs," Social Safety Nets Primer Notes, no. 1, pp. 1-2, 2003.
[5] A. Chalfin, B. Hansen, J. Lerner, L. Parker, "Reducing crime through environmental design: Evidence from a randomized experiment of street lighting in New York City," NBER Working Paper No. 25798, pp. 1-29, 2019.
[6] A. Wheeler, W. Steenbeek, "Mapping the risk terrain for crime using machine learning," Journal of Quantitative Criminology, pp. 1-34, 2020.
[7] L. Alves, H. Ribeiro, F. Rodrigues, "Crime prediction through urban metrics and statistical learning," Physica A: Statistical Mechanics and its Applications, pp. 435-443, 2018.
[8] A. Bogomolov, B. Lepri, J. Staiano, N. Oliver, F. Pianesi, A. Pentland, "Once upon a crime: Towards crime prediction from demographics and mobile data," Proceedings of the 16th international conference on multimodal interaction, pp. 427-434, 2014.
[9] R. Iqbal, M. Murad, A. Mustapha, P. Panahy, N. Khanahmadliravi, "An experimental study of classification algorithms for crime prediction," Indian Journal of Science and Technology, vol. 6, no. 3, pp. 4219-4225, 2013.
[10] XLSTAT, "Which statistical test should you use?," help.xlstat.com/s/?language=en_US. [Online]. Available: https://help.xlstat.com/s/article/which-statistical-test-should-you-use?language=en_US.
[11] M. Hussain, I. Mahmud, "pyMannKendall: a python package for non parametric Mann Kendall family of trend tests," Journal of Open Source Software, vol. 4, no. 39, pp. 1-3, 2019.