Featured Projects
2023 Spring
In collaboration with Jie Li and Mia Cherayil
R, Python, Google Cloud Platform
Knowing when buses will come is helpful when planning trips. Unfortunately, SEPTA buses do not often follow their schedules. They arrive either earlier or later than expected because they are unequally impacted by congestion, disruptions such as illegal parking, and long dwell times due to heavy ridership and long boarding times. This leads to the issue of bus bunching, which is when two consecutive buses arrive at bus stops very soon after one another. This is almost always accompanied by bus gapping, which is when there are long gaps between the arrival of consecutive buses at any given bus stop. As a result, people are delayed getting to their destinations and may also be forced to ride very crowded buses.
This project aims to provide reliable and frequent transit service to residents during their commute by predicting and responding to bus-bunching along key transit corridors.
The outcome of our predictions from this analysis is the initiation of bunching i.e. where does bunching begin. Once two buses start to bunch, they will likely continue to bunch at the subsequent stops. In order to prevent bunching, it is most effective to understand where bunching starts to happen because interventions can then be targeted at this point.
Jie Li is the parimary author of the real-time app built on google cloud platform.
2022 Fall
In collaboration with Ran Wang and Zoe Yoo
Python, Jekyll
On August 18, 2022, the Boston MBTA’s Orange Line was shut down for a month due to safety repair work. To open more transportation options for Orange Line commuters, Mayor Wu announced that Bluebikes, Boston’s bikeshare system, would be free during the month of the shutdown. During the month, bikeshare demand surged tremendously, bypassing expectations of demand even for the free period. Monthly ridership has increased by over 100,000 in August and September in 2022, a growth of nearly 40% from the previous year.
Essentially, this project explores the effect of orange line shutdown and free bikeshare service on ridership. To see effects, the following questions are answered:
What new trend did we see during the shutdown period?
What areas’ riderships have been impacted the most by the orange line shutdown and the free bikeshare service?
After Bikeshare became free, what variables contributed ot the increase in demand?
We found that free bikeshare benefited not only orange commuters, but also other subway commuters as well, most likely for last-mile coonecting demand.
2022 Spring
R
Pedestrian fatalities have been accelerating for the past decade in the US. Fatality and Injury Reporting System (FARS) reports that 6515 pedestrians died in vehicle crashes in 2020, accounting for 18% of traffic fatalities. The number grew by 50% from 2010, when total traffic fatalities have only increased by 18%.
In addition, 75% of these pedestrian crashes occurred during dark conditions. This is a more critical issue in low-income neighborhoods where most people would at least walk for a portion of their commuting trips. They include seniors and children, who rely on walking and public transit to access food, health, education, and other public services. It is also an equity issue. A 2022 study reveals that Black and Native American pedestrians have a much higher chance to involve in fatal crashes at night compared to other races.
In Philadelphia, between 2016 to 2020, around 33% of the pedestrian crashes happened overnight (18:00 - 7:00). The number jumps to 62% when looking into fatality cases. The goal of this report is to discover the correlation of physical, demographic, and environmental factors to the number of pedestrian crashes at night in Philadelphia, PA. This report seeks to see whether there is a need to improve pedestrian infrastructure at night. The data is gathered from PennDOT, OpenData Philly, Philadelphia Tree Inventory, Philadelphia Police District, and 5-year American Community Survey. The first section discusses the lighting conditions in Philadelphia. Area brightness is thought to be one of the major factors contributing to night pedestrian crashes.
2021 Fall
In collaboration with Hanpu Yao
R
Santa Monica has one of the country's most inefficient emergent medical service (EMS) systems. The national average time for an ambulance to respond to an emergency call is 7 minutes. Even in a rural setting, it is 14 minutes. Santa Monica, however, based on the city data from March to November 2021, takes 30 minutes on average from receiving the call to arrive at the destination, much longer than the national average.
We located the problem to be its ambulance dispatch system. The traditional way to respond to an EMS call is to dispatch an ambulance after the station receives a call. The problem is when the nearest station runs out of ambulances, the station needs to request an ambulance from another station further away which adds to the wait time.
Thus, we aim to develop the app "beepo" to make ambulance dispatch more efficient in Santa Monica. Our purpose is to manage ambulance vehicles in all four stations citywide, predict when and where the demand for emergency calls rises, and dispatch ambulances to the nearest station when the demand rises.
2021 Fall
R
"Re-balancing", in terms of a bikeshare system, refers to the process of reallocating available bikes at a given time according to the demand. Failure to meet the demand is undesirable for both the users and the bikeshare companies -- the former loses a means of transportation and the latter loses potential revenue. Therefore it is important to have a sense of how many people will pick up a bike at a certain time and station -- in order to redistribute excessive bikes at other locations to stations that actually have high demand. Usually, re-balancing will take place manually by small trucks to move the bikes around.
The analysis below will examine the bike demand in Boston, MA in August and September 2019. "Bluebikes" is a private company that operates the bike share system in the city. It employs 4-5 rebalancing vans, each with a payload of 20-25 bikes, to redistribute bicycles 24 hours a day, 7 days a week. A regression model is developed based on weather and one-hour to one-week time lag to predict the number of trips that occur at a particular time and station. Since the company is redistributing bikes every hour, a one-hour time lag is appropriate, because that will allow van drivers to know an hour beforehand on the demand of bikes at different locations. Other time lags are added to strengthen the model.
2021 Fall
In collaboration with Sabir Nazarov and Will Friedrichs
R
Zillow's housing market predictions are an integral part of its business model, helping the company achieve a greater understanding of how the market will value properties. Our team is confident that through our geospatial machine learning-based model that considers not only attributes of homes, but local factors as well, we can improve Zillow's house price predictions for the Boulder County study area and provide a template that can be adapted to other localities.
This project outlines and analyzes the process by which we created our model in three broad stages: data gathering, regression modeling, and finally, prediction and validation. Our process involves splitting our total dataset into a training set consisting of 75% of the points, and a testing set consisting of the other 25%. These data are used to create our model, which predicts the 100 observations of the challenge set.
We found that our model's predictive capabilities were bolstered by the addition of geographic features. While many variables from the given dataset were also central to the model’s predictions, variables such as distance to the nearest park and the mean distance to the five closest trailheads were very important to the model.
2021 Fall
R
This report evaluates the effectiveness of geo-spatial processes in controlling selection bias (i.e. reducing errors) when predicting stalking risk in Chicago, Illinois. First, the report employs feature engineering strategies to make data usable for correlation analysis. Next, Local Moran’s I will be calculated to see whether stalking is committed in clusters. In addition, race context is also included in the evaluation.
2021 Fall
R
Transit Oriented Development (TOD) has received much attention in the field of urban planning. Ideally, this planning model entails a more dense, walkable, sustainable city; nevertheless, some criticize that the model will make housing prices spike, displacing residents in poor neighborhoods. This project evaluates the effect and potential of TOD in Chicago, a large American city with a mature transit system. How had the demographics changed within potential TOD areas in ten years? What’s the relationship between these areas, rent, and poverty rate? Are people willing to pay more for houses and rents near these areas?
This brief compares the data from 2009 and 2019. The year 2009 is chosen because some data is only available starting from 2009. A period of ten years provides a long enough time frame to see changes. Since the most recent data has been updated in 2019, the data set also reflects a more accurate situation close to the year we are in now (2021).