Research

Research conducted as a member of the Applied Motion Lab and the Vision Lab, tackling problems relating to quantifying and simulating human behavior.

Advised by Dr. Stephen J. Guy and Dr. Hyun Soo Park.

Simultaneous Localization and Affordance Prediction

(Accepted ICRA '25; Project Page)

We leveraged Ego-Exo4D demonstrations to augment VLMs in two ways: through understanding spatial task-affordances, and the localization of that task relative to the egocentric viewer. We then demonstrate this system on a simulated robot.

Ego-Exo4D

(CVPR 2024; arXiv)

A massive-scale exo+egocentric video-language dataset and benchmark suite for skilled activities.
My contributions included assisting with dataset standardization, on-site data collection, and annotation.

(a) Example of paired video data

(b) Example 3D Reconstruction (Basketball)

Improving Robotic Home Assistants through Human Pose Prediction

(Work done for Meta internship)

By using a small dataset of human poses, we're able to learn a geometry-aware pose prediction network which is used to augment the reward function for reinforcement learning. Our system improved robot efficiency over SotA for house-cleaning tasks.

Ego4D

(CVPR 2022; arXiv)

A massive-scale egocentric video-language dataset and benchmark suite for everyday activities.
My contributions included 3D reconstruction of first-person walking videos and a benchmark implementation to predict future trajectories given a first-person image.

(a) Necessary Geometry

(b) Future trajectory prediction

Implicit Future-Localization (2021)

Our system is able to jointly predict the navigational affordances and future motion of the observer as implicit fields aligned with the image-space features.

(a) First-Person Image

(b) Inferred Walkability

(d) Goal at Horizon

Navigation Fields (2020)

We optimize differentiable fields based on sparse user-defined rules to represent navigation policies defined over all of space for mobile agents.

navfield_demo.mp4

Example of mobile robot following a navigation field around obstacles.

Simulating Continuous Crowds via Deep Learning (2019)

Applying a novel deep learning framework to discovering and simulating the equations that govern how agents in a crowd move as a continuum. Images show estimates of crowd flow and density.

Estimating 3D Social Saliency (2020)

Using a series of images from the same frame, we want to estimate the interaction between each person by extracting the socially salient features of the image. Several people are reconstructed in 3D, and an estimate of their gaze direction is used to determine the socially salient features. This is the first step in an ongoing project.

5561_Final_Project_Final.pdf

In-depth analysis of method

zach chavis poster (1).pdf

Abbreviated version of final project paper

Undergraduate Work

Robot Vision

Using an RGB-D camera, I developed a robot-mounted vision system which can interpret point clouds to facilitate localization and planning. The robot can recognize and avoid obstacles in an environment using any planning algorithm that takes in a collection of objects to avoid.

robotvisualization1829.mp4

Teaching AI to play games from vision in Real-Time (On Hiatus)

Using machine learning, Dr. Guy and I are using the game Flappy Bird as a subject for teaching AI to play a game as a human would. The way I play the game is recorded via only the pixel state of the screen, fed into a neural network, and the resulting weights are used as the policy for an automated controller. The controller then "sees" the game, and decides what to do based on the image. The basic framework is in C++ on Windows for this project, utilizing OS level timing, frame capturing, and input capturing. The machine learning framework is done separately in Python. This is an ongoing project temporarily on hiatus.

Report abuse