NeMo: 3D Neural Motion Fields from Multiple Video Instances of the Same Action

Summary:

We aim to bridge the gap between monocular human mesh recovery (HMR) methods and multi-view MoCap systems by leveraging information shared across multiple video instances of the same action.  To achieve this, we introduce the Neural Motion (NeMo) field which is optimized to represent the underlying 3D motions across a set of videos of the same action.  

Empirically, we show that NeMo can recover 3D motion in sports using videos from both the Penn Action dataset and a MoCap dataset we collected mimicking actions in Penn Action, and show that NeMo achieves better 3D reconstruction compared to various baselines. 

Visit our GitHub page for code and the NeMo-MoCap dataset!

Comparison with Baselines

Tennis Serve

Baseball Pitch

Tennis Swing

Extra NeMo Results on our MoCap Dataset from Different Views

Baseball Pitch

Baseball Swing

Tennis Serve

Tennis Swing

Golf Swing

Visualizing "Variations"

Below we show the different variations across the learned NeMo fields learned from both our MoCap dataset and the Penn Action dataset. 

It's interesting that the motions learned from the Penn Action dataset are often more exaggerated.  This is because the Penn Action dataset videos often come from advanced athletes performing the action, whereas our own MoCap dataset was not collected with an advanced player. 

Tennis Serve examples

Baseball Pitch examples

Baseball Pitch

Rendered in RED are from our MoCap dataset, and in GRAY are from the Penn Action dataset. 

Tennis Serve

Rendered in RED are from our MoCap dataset, and in GRAY are from the Penn Action dataset.