COPILOT: Human-Environment Collision Prediction and Localization from Multi-view Egocentric Videos

Boxiao Pan, Bokui Shen*, Davis Rempe*, Despoina Paschalidou, Kaichun Mo, Yanchao Yang, Leonidas J. Guibas
(* Equal contribution)

              Stanford University              NVIDIA             The University of Hong Kong

International Conference on Computer Vision (ICCV), 2023

[Arxiv] [Data]

Introduction

We propose the problem of predicting human-scene collisions from multi-view egocentric RGB videos captured from body-mounted cameras. Specifically, the problem consists of predicting: (1) if a collision will happen in the next H seconds; (2) which body joints might be involved in a collision; and (3) where in the scene might cause the collision, in the form of a spatial heatmap.

To solve this problem, we present COPILOT, a COllision PredIction and LOcalization Transformer that tackles all three sub-tasks in a multi-task setting, effectively leveraging multi-view video inputs through a proposed 4D attention operation across space, time, and viewpoint.

To train and evaluate the model, we further develop a synthetic data pipeline that simulates virtual humans walking and possibly colliding in photo-realistic 3D environments. This pipeline is then used to establish a large-scale dataset consisting of ~8.6M egocentric RGBD frames. 

We perform extensive experiments that demonstrate COPILOT's promising performance, especially on sim-to-real transfer. Notably, we also apply COPILOT to a downstream collision avoidance task, and successfully reduce collision cases by 29% on scenes unseen during training.

Video Presentation

Data examples

Third-person view is not provided to the model.

Sim-to-real transfer

Per-frame collision predictions are overlaid on the observation videos.

Model predictions in simulation

Per-frame collision predictions are overlaid on the third-person rendering.

Collision avoidance assistance

Uncolored meshes indicate the history, orange is the original future, and blue is the future using collision avoidance assistance.