Home

Pixel Lab. Korea University

Perceptual Intelligence X Enhanced Learning Lab

Welcome to Pixel Lab!

Our lab works on a variety of machine learning topics, particularly in Multimodal Learning, Self-supervised Learning, and Social Artificial Intelligence. We take interdisciplinary approaches—spanning computer vision, natural language processing, audio processing, and other relevant fields—to address real-world challenges. For more details on our research areas, please visit here

Lab News

Sangmin co-organized the Artificial Social Intelligence Workshop at ICCV 2025
A paper on Human Gesture Understanding was accepted at NeurIPS 2025
Toward Human Deictic Gesture Target Estimation
A paper on Speech-Driven 3D Facial Animation was accepted at ICCV 2025
MemoryTalker: Personalized Speech-Driven 3D Facial Animation via Audio-Guided Stylization
Our lab has secured a research grant from NRF (우수중견연구), amounting to 1.2 billion KRW over five years
Five papers on Social Understanding, Multimodal Understanding, and Generative Model were accepted at CVPR 2025
SocialGesture: Delving into Multi-person Gesture Understanding
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders (Highlight, Top 4%)
Object-aware Sound Source Localization via Audio-Visual Scene Understanding
Question-Aware Gaussian Experts for Audio-Visual Question Answering (Highlight, Top 4%)
Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation (Highlight, Top 4%)
A paper on Video Moment Retrieval & Highlight Detection was accepted at AAAI 2025
Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight Detection
Sangmin was selected as an outstanding reviewer at BMVC 2024
A paper on Social AI Survey was released on arXiv
Towards Social AI: A Survey on Understanding Social Interactions
A paper on Video Point Tracking was accepted at ECCVW 2024
Leveraging Object Priors for Point Tracking
A paper on Text-Video Retrieval was accepted in Pattern Recognition
Text-Guided Distillation Learning to Diversify Video Embeddings for Text-Video Retrieval
A paper on Social Interaction Modeling was accepted at CVPR 2024 (Oral, Top 0.8%)
Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations
A paper on Sound Source Localization was accepted at CVPR 2024
Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge
A paper on Speech-Driven 3D Facial Animation was accepted at ICIP 2024
Analyzing Visible Articulatory Movements in Speech Production for Speech-Driven 3D Facial Animation

Google Sites

Report abuse