Multimodal Sense-Informed Forecasting of 3D Human Motions
_ _________________________________________________________________
SIF3D incorporates three modalities of input, 1) the past motion sequence, 2) the 3D scene point cloud, and 3) the human gaze.
● MotionEncoder encodes past motion sequence into a motion embedding, and the 3D scene point cloud is encoded through PointNet++.
● Two novel cross-modal attention are proposed: Semantic Coherence-aware Attention (SCA), and Ternary Intention-aware Attention (TIA).
● TrajectoryPlanner and a PosePredictor are applied to predict trajectory and poses, respectively. And finally, the predicted motion sequence is generated through a MotionDecoder, which is supervised by the geometric discriminator.
case-1: Bedroom
BiFu
(ECCV'22)
SIF3D
(Ours)
SIF3D: the captured saliency of 3D point clouds
Global Saliency of SIF3D
Local Saliency of SIF3D
case-2: meeting room
BiFu
(ECCV'22)
SIF3D
(Ours)
SIF3D: the captured saliency of 3D point clouds
Global Saliency of SIF3D
Local Saliency of SIF3D
BiFu: Zheng, Yang, et al. Gimo: Gaze-informed human motion prediction in context. ECCV 2022.