To create predictors indicating the distance of nearest landmarks to crimes, we needed to calculate haversine coordinate distances between each crime (as many as ~115,000) and each predictor (as many as ~85,000). To minimize computational cost, we found an existing haversine function and adapted it to run using only numpy arrays and calculations:
We calculated the distances in meters between all crimes and streetlights and appended the minimum distance for each crime to our "distances" variable. We also counted the number of streetlights within 80 meters of each crime and appended those counts to our "light density" variable:
We calculated the distances between all crimes and all properties, and appended the mean value of all properties within 240 meters to our three_block_mean variable:
We also calculated minimum distances for our other landmark predictors:
After performing all of the calculations, we added those variables as our dataset predictor columns.
We performed the same calculations for our dataset with the "New" crime type categories and added those variables as well.
Data Exploration for Original Categories
Data Exploration for New Categories