Project Question:
How can we model interactions in crowded environments?
Team Members: Enya Ma, Mary Han, Nia Khardzeishvili, Renee Sharma, Rohan Rai, and Siana Kabaria
Faculty/Graduate Students: Sarjana Sachidanandam
Our project aims to develop a model that can accurately predict the future positions of pedestrians. This is key for applications that use autonomous navigation in robots. To achieve this, we used a neural network called a Spatio-Temporal Graph Convolutional Neural Network (STGCNN).
For this project, we used Python 3.6 for scripting and environment management in Google Colab. We used Torch and their neural network functions to build our machine learning model as well as various other libraries and their functions such as Numpy. For our simulation environment and motion planner we used PyGame. The research and process behind this project was based on the paper linked here.
A convoluted neural network, or CNN, filters information by using a mathematical operation. The operation can detect the presence of features anywhere in a given image by layering a search frame, or a kernel, on top of each group of pixels and searching for a match.
This operation passes through a regular neural network in between the input and the hidden layers to create a Convoluted Neural Network.
Our Process
Step 1: Data Acquisition
The model was trained on video data gathered from human trajectory datasets from crowded public environments. The datasets were called ETH, HOTEL, ZARA1, ZARA2, and UNIV. Each dataset represented a different location where people's walking behaviors were slightly different, to maximize the diversity in the data and allow for the best results in the most types of locations. The data is sampled from a bird's eye view, and formatted in 8 frames per group to give information to our model.
Step 2: Data Analysis
Our data is represented as graphs using nodes and edges. Nodes represent pedestrians at different time steps while edges represent their interactions based on proximity. The model we used is called the Social-STGCNN (Spatio-Temporal Graph Convolutional Neural Network). This allows us to input our spatio-temporal graphs and receive the predicted trajectories of pedestrians. Within this model, our graphs go through a convolution process where spatial and temporal data are combined. Then, the Time Extrapolator CNN predicts future positions based on past positions.
Step 3: Model Training and Testing
To train our model, we used an optimization algorithm named Gradient Descent:
This minimizes the loss function of a model by changing the model's parameters. The model parameters are the adjustable elements that are tweaked to minimize the difference between predictions and actual values. To control how much we adjust the parameters during each step, we define a learning rate. A learning rate too high might overshoot the optimal values, while a low one would take a long time to process.
Forward pass:
To compute the predicted positions of pedestrians for future time steps, we pass the input data through the model.
Loss function:
Loss is calculated through a loss function that measures the difference between the predicted positions and the true positions.
Backward Pass:
The backward pass calculates gradients that understand how the change in model parameters affects the loss. This is then used to adjust the model parameters.
Iteration:
One epoch is one complete pass through a full training dataset. In order to achieve the most cost-efficient results, we tested and adjusted the number of epochs. All of the steps above were repeated for the set number of epochs.
Metrics:
In order to evaluate the model's predictions, we used two metrics:
Step 4: Motion Planner Mapping and Results
After creating the STGCNN model using, we used PyGame to create a motion planner for how our robot would move through a crowded area like the ones used in our testing. The robot learns from the model to find the most efficient path through a crowd, avoiding pedestrians and other obstacles, in order to reach the goal.
Our project consisted of two main machine-learning algorithms. The data was first processed through a graph convolution network, or GCN, in order to map pedestrian distance and influence data. Then, a time extrapolator CNN is used to map future pedestrian behavior. Finally, we use a motion planner to map ideal paths for our robot to take in order to maneuver through the crowd without hitting any obstacles.
Deliverables/Results
We created a motion planner that effectively and accurately navigates through a crowd model based on a given dataset. It is able to follow a path to reach a set target while also avoiding collisions with simulated pedestrians and other stationary objects.
Final Average ADE: 0.2088858547785462
Final Average FDE: 0.32067492012386234
Our final ADE and FDE were low, meaning that the model is accurate.
Conclusion
Applications:
When applied to a robot motion planner, our model has many applications that can be used for interactions with robots and humans. These include making a safe path for delivery robots and also assistant robots which help humans with disabilities like blindness and other movement disorders
Changes or Future Improvements:
Allow the model to train and test in a reasonable amount of time on the Computer's CPU.
Making it more accessible to people (ex. creating a physical device that is programmed with this model)