Perception
Generation
Understanding
Video and Scene Understanding
2D/3D Reasoning
Reasoning Segmentation
Video Understanding (DSGG)
3D pose prediction