CVPR_2024

Multimodal Sense-Informed Forecasting of 3D Human Motions

_ _________________________________________________________________

Zhenyu Lou, Qiongjie Cui*, Haofan Wang, Xu Tang, Hong Zhou

Accepted by CVPR 2024

Qiongjie Cui is the corresponding author

[Paper] [Code] [Poster]

1. Video

2. Proposed Method

SIF3D incorporates three modalities of input, 1) the past motion sequence, 2) the 3D scene point cloud, and 3) the human gaze.

● MotionEncoder encodes past motion sequence into a motion embedding, and the 3D scene point cloud is encoded through PointNet++.

● Two novel cross-modal attention are proposed: Semantic Coherence-aware Attention (SCA), and Ternary Intention-aware Attention (TIA).

● TrajectoryPlanner and a PosePredictor are applied to predict trajectory and poses, respectively. And finally, the predicted motion sequence is generated through a MotionDecoder, which is supervised by the geometric discriminator.

3. Comparisons with the state-of-the-art baseline

case-1: Bedroom

BiFu

(ECCV'22)

SIF3D

(Ours)

SIF3D: the captured saliency of 3D point clouds

Global Saliency of SIF3D

Local Saliency of SIF3D

case-2: meeting room

BiFu

(ECCV'22)

SIF3D

(Ours)

SIF3D: the captured saliency of 3D point clouds

Global Saliency of SIF3D

Local Saliency of SIF3D

BiFu: Zheng, Yang, et al. Gimo: Gaze-informed human motion prediction in context. ECCV 2022.

Page updated

Google Sites

Report abuse