NeMo: 3D Neural Motion Fields from Multiple Video Instances of the Same Action
Kuan-Chieh Wang, Zhenzhen Weng, Maria Xenochristou, Joao Pedro Araujo, Jeffrey Gu, C. Karen Liu, Serena Yeung
Stanford University
CVPR 2023 Highlight
[ Paper 📄] - [ Code / Data 🛠️]
Summary:
We aim to bridge the gap between monocular human mesh recovery (HMR) methods and multi-view MoCap systems by leveraging information shared across multiple video instances of the same action. To achieve this, we introduce the Neural Motion (NeMo) field which is optimized to represent the underlying 3D motions across a set of videos of the same action.
Empirically, we show that NeMo can recover 3D motion in sports using videos from both the Penn Action dataset and a MoCap dataset we collected mimicking actions in Penn Action, and show that NeMo achieves better 3D reconstruction compared to various baselines.
Visit our GitHub page for code and the NeMo-MoCap dataset!
Comparison with Baselines
Tennis Serve
notice the "swapping of arms" and discontinuity of the arm swing in the baseline predictions.
Baseball Pitch
notice the swapping of both arms and legs in all baselines.
Tennis Swing
notice the "swapping of legs" and offset in the legs for GLAMR, jittering and offset in the arm swing of VIBE and PARE.
Extra NeMo Results on our MoCap Dataset from Different Views
Baseball Pitch
Baseball Swing
Tennis Serve
Tennis Swing
Golf Swing
Visualizing "Variations"
Below we show the different variations across the learned NeMo fields learned from both our MoCap dataset and the Penn Action dataset.
It's interesting that the motions learned from the Penn Action dataset are often more exaggerated. This is because the Penn Action dataset videos often come from advanced athletes performing the action, whereas our own MoCap dataset was not collected with an advanced player.
Tennis Serve examples
Baseball Pitch examples
Baseball Pitch
Rendered in RED are from our MoCap dataset, and in GRAY are from the Penn Action dataset.
Tennis Serve
Rendered in RED are from our MoCap dataset, and in GRAY are from the Penn Action dataset.