Image Captioning / Dense Video Captioning
Video Text Retrieval
Visual Question Answering
Sign Language Recognition
Language to Image / Video Generation
Audio to Image Generation
Semi- & Weakly-Supervised Learning
Zero-shot & Few-shot Learning
Data Bias
Active Learning
Domain Adaptation
Synthetic Dataset
Visual Relationship
Segmentation
Object Detection
Attribute Recognition
Human Action Recognition
Human-Object Interaction Detection
Action Localization
Human Pose