Harmonizing Stochasticity and Determinism:
Scene-responsive Diverse Human Motion Prediction
_ _________________________________________________________________
_ _________________________________________________________________
Anonymous author
in submission to NIPS 2024 (do not distribute)
● Existing approaches to diverse motion prediction concentrate on the stochastic characteristics of human movement, often overlooking the external environment, leading to significant issues such as scene context penetration, and scene inconsistency in predictions when applied to real-world contexts.
● Cross-modal analysis of the observed motion and the scene is needed to undertand potential human intention within 3D scenes.
● Scene-aware motion prediction demands the predicted motion to be consistent with the scene context, including obstable avoiding.
SIF3D integrates two input modalities, 1) past motion sequences, and 2) 3D scene point clouds.
● Context-Aware Intermodal Interpreter identifies interactive objects in the scene, and analyze potential human interest through a cross-modal InterestNet, finally samples an object as movement target based on this analysis.
● Behaviorally-Consistent Stochastic Planner first predicts the human-object interactive poses as the final state of the predicted motion, and then search obstacle-free trajectories from the observation toward destination.
● Self-Prompted Motion Generator diverse human motions while maintain the observation and the planned trajectory through overwriting intermediate results at each denoising step.
● MotionCLIP is introduced to further supervise the predicted motion to be consistent with the target object.
Experiments show that DiMoP3D is able to predict motions with diverse actions and also varies motions toward a deterministic object, while maintain each motion sequence to be physical consistent.
case-1: Bedroom
BelFusion
(ICCV'23)
DiMoP3D
(Ours)
case-2: seminar room
BelFusion
(ICCV'23)
DiMoP3D
(Ours)
case-1: Bedroom
BiFU (deterministic) DiMoP3D (closest sample) Ground Truth
case-2: Laboratory
BiFU (deterministic) DiMoP3D (closest sample) Ground Truth
BelFusion: Barquero, et al. Belfusion: Latent diffusion for behavior-driven human motion prediction. ICCV 2023.
Code is archived in the .zip package of the supplementary material.