Mumtaz Uddin

Github

Phase I

Introduction

When people think of 911, they may think first of emergency medical services. But a significant portion of the 911 calls made every year in the United States are routed to police departments.[2] 911 call-takers and dispatchers have always played a critical role in improving the police response to critical incidents of all types, including incidents that have the potential for use of lethal force.[3] Citizens call 911 for all sorts of emergency and non-emergency related issues.

In this project, my goal is to find and identify a pattern between 911 emergency calls and to determine which areas are prone to incidents based on the priority of the calls and to find out how many police officers would be required on a particular 911 emergency call and I will achieve that using K-Means Clustering.

Image Credits- Caldwell County [5]

Hypothesis Questions-

When to expect more 911 calls and where do these calls come from?
When does the need for police officers/first responders increase/decrease?
Can we schedule their shifts in a more efficient manner?

Overview of the data-

Implementation

The goal of this project is to check different types of reports made and try to identify a pattern between them and to determine the highest numbers of call for which reason. I will accomplish this goal by using Exploratory Data Analysis, K-Means Clustering and Data Visualization. This research will prove helpful to guide current services for improvement and to determine them on the areas most affected and prone to have an incident report.

For EDA & Data Cleaning-
I will use libraries such as pandas, numpy, matplotlib, seaborn, scikitlearn,
ML Models-
K-Means Clustering

Presentation Phase - I

Phase 1

Phase II

Exploratory Data Analysis
Data Cleaning & Data Visualization

Data Cleaning

Before Cleaning After Cleaning

The image on the left shows the data before cleaning with a lot of null values and outliers. In this section I did data cleaning and removed all the null values from the data set. Also I removed a lot of columns (such as VRIZones, Census_Tracts etc.) which I felt were unnecessary and had little to no information, which was better to achieve my goal for this project.

Data Visualization-

Graph 1

Graph 2

In Graph 1 and Graph 2, the pie chart shows the percentage of calls based on priority. There are 3 priority call distribution namely High, Medium and Low. and the bar chart shows the count of the same using the countplot of seaborn library.

The following graphs in the carousel reports the number of calls made on hourly, monthly, daily basis, and the day of the week which represents the priority as well.

The graphs prove the hypothesis-

Using the EDA and Time-series analysis we can tell that when to expect more 911 calls and where are these calls coming from.

The time series graph shows the monthly trend of number of incidents reported over the year 2020.

Based on the Time-series and EDA of monthly, daily and weekly trend we can interpret when does the need for police officers/first responders increase or decrease.

We can see from the time series graph that the first week of July 2020 shows the most incident reported.

This graph shows the number of 911 calls based on the geography of the Baltimore area.

Blue shows the range from 0-5000 calls.

Violet shows the range from 5000-10000 calls.

Pink shows the range from 15000-20000 calls.

Orange-Yellow shows the range of 20000+ calls.

This plot gives a visual representation like google maps. This was achieved using Google API.

The markers help to locate from what location the calls came from.

Presentation Phase-II

911 calls phase 2

Phase III

Machine Learning

Execution and Interpretation

Clustering Using K-Means-

In my data set, I have 5 Police districts and 4 Sub districts in the Baltimore area namely,

Northern
Eastern
Western
Central
Southern
North-Eastern
North-Western
South-Eastern
South-Western

According to the data there were 9 police districts which had 350 neighborhoods. By using K- Means we were able to determine the priority and to find out how many police officers will be required for every 911 call in an individual neighborhood in a police district.

Below are 2 examples of the above 9 districts to show how the model clustering works. (Southern & Central)

The figure above tells us the Southern police district clusters, where clusters represents the neighborhood falling into those cluster and the priority scores tells us the number of police officers that will be required to show up in the particular neighborhood.

The figure above tells us the central police district clusters, where clusters represents the neighborhood falling into those cluster and the priority scores tells us the number of police officers that will be required to show up in the particular neighborhood.

Like we can see that Cluster 5 having neighborhoods like [Downtown, Inner Harbor] are more prone to have a high priority or medium priority 911 call and that's why around 32 police officer will be needed in that area.

NOTE: Pie chart is just the illustration of those clusters and they just show the percentage distribution.

Based on this K-Means clustering, I can tell that how many police officers will be required based on the priority score and neighborhood.

Using the results, we can schedule their shifts in a more efficient manner and prioritize neighborhood based on historic data.