Chung-Ching Lin
I am a Principal Researcher at Microsoft, where I focus on pushing the boundaries of multimodal understanding and generation. I also work as part of Azure and OpenAI collaboration.
I have worked in the fields of computer vision, machine learning, and statistical deep learning. The focus of previous research was on algorithms for visual perception (object recognition, localization, segmentation, tracking, etc.), representation learning, and the interaction of vision and language.
News
[2025/02] Two papers accepted to ICLR 2025: (1) GenXD for 3D and 4D scene generation, (2) SlowFast-VGen for dual-speed action-driven video generation
[2024/11] Please checkout our new paper: VisVM for VLM self-training
[2024/09] Motion Consistency Model accepted to NeurIPS 2024: Accelerating video diffusion
[2024/07] Two papers accepted to ECCV 2024: (1) Idea2Img, an LMM-based agent system for visual design and creation, (2) IDOL, joint video-depth generation for human dance videos.
[2024/02] Two papers accepted to CVPR 2024: (1) MM-Narrator, audio descriptions (AD) generation with GPT-4, (2) DisCo, human dance generation with disentangled controls
[2023/10] MaskComp accepted to ICML 2024: Completing visual objects
[2023/10] MPT accepted to WACV 2024: Human pose and mesh reconstruction
[2023/09] PaintSeg accepted to NeurIPS 2023: Training-free segmentation
[2023/02] Three papers accepted to CVPR 2023: (1) AdaM, video matting, (2) NVF, 3D Hand Pose Estimation, (3) LAVENDER, unifying video-language understanding
[2022/02] Two papers accepted to CVPR 2022: (1) ResT, zero-shot action recognition, (2) SwinBERT, video captioning
[2021/02] Two papers accepted to ICLR 2022: (1) AdaFuse, efficient action recognition, (2) VA-RED2, efficient action recognition
[2020/06] AR-Net accepted to ECCV 2020: Efficient action recognition
[2020/02] VIST accepted to CVPR 2020: Video instance segmentation tracking
Selected Publications
arXiv preprints
Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension
Xiyao Wang, Zhengyuan Yang, Linjie Li, Hongjin Lu, Yuancheng Xu, Chung-Ching Lin, Kevin Lin, Furong Huang, Lijuan Wang
[PDF] [Code]The dawn of LLMs: Preliminary explorations with GPT-4V (ision)
Zhengyuan Yang*, Linjie Li*, Kevin Lin*, Jianfeng Wang*, Chung-Ching Lin*, Zicheng Liu, Lijuan Wang
[PDF] [Acknowledgments]MM-VID: Advancing video understanding with GPT-4V (ision)
Kevin Lin*, Faisal Ahmed*, Linjie Li*, Chung-Ching Lin*, Ehsan Azarnasab, Zhengyuan Yang, Jianfeng Wang, Lin Liang, Zicheng Liu, Yumao Lu, Ce Liu, Lijuan Wang
[PDF] [Project page]
Conference Paper
GenXD: Generating Any 3D and 4D Scenes
Yuyang Zhao, Chung-Ching Lin, Kevin Lin, Zhiwen Yan, Linjie Li, Zhengyuan Yang, Jianfeng Wang, Gim Hee Lee, Lijuan Wang
ICLR 2025
[PDF] [Project page]SlowFast-VGen: Slow-Fast Learning for Action-Conditioned Long Video Generation
Yining Hong, Beide Liu, Maxine Wu, Yuanhao Zhai, Kai-Wei Chang, Linjie Li, Kevin Lin, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, Yingnian Wu, Lijuan Wang
ICLR 2025
[PDF] [Project page]Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation
Yuanhao Zhai, Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Chung-Ching Lin, David Doermann, Junsong Yuan, Lijuan Wang
NeurIPS 2024
[PDF] [Code] [Project page]IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
Yuanhao Zhai, Kevin Lin, Linjie Li, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, David Doermann, Junsong Yuan, Zicheng Liu, Lijuan Wang
ECCV 2024
[PDF] [Code] [Project page]Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation
Zhengyuan Yang, Jianfeng Wang, Linjie Li, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang
ECCV 2024
[PDF] [Project Page] [Video]DisCo: Disentangled Control for Realistic Human Dance Generation
Tan Wang*, Linjie Li*, Kevin Lin*, Yuanhao Zhai, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang
CVPR 2024
[PDF] [Project page]MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Chaoyi Zhang, Kevin Lin, Zhengyuan Yang, Jianfeng Wang, Linjie Li, Chung-Ching Lin, Zicheng Liu, Lijuan Wang
CVPR 2024 (Highlight)
[PDF] [Project Page]Completing Visual Objects via Bridging Generation and Segmentation
Xiang Li, Yinpeng Chen, Chung-Ching Lin, Hao Chen, Kai Hu, Rita Singh, Bhiksha Raj, Lijuan Wang, Zicheng Liu
ICML 2024
[PDF]MPT: Mesh Pre-Training with Transformers for Human Pose and Mesh Reconstruction
Kevin Lin, Chung-Ching Lin, Lin Liang, Zicheng Liu, Lijuan Wang
WACV 2024
[PDF]PaintSeg: Painting Pixels for Training-free Segmentation
Xiang Li, Chung-Ching Lin, Yinpeng Chen, Zicheng Liu, Jinglu Wang, Rita Singh, Bhiksha Raj
NeurIPS 2023
[PDF] [Code]Equivariant Similarity for Vision-Language Foundation Models
Tan Wang, Kevin Lin, Linjie Li, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang
ICCV 2023 (Oral)
[PDF] [Code]Adaptive Human Matting for Dynamic Videos
Chung-Ching Lin, Jiang Wang, Kun Luo, Kevin Lin, Linjie Li, Lijuan Wang, Zicheng Liu
CVPR 2023
[PDF] [Video]Neural Voting Field for Camera-Space 3D Hand Pose Estimation
Lin Huang, Chung-Ching Lin, Kevin Lin, Lin Liang, Lijuan Wang, Junsong Yuan, Zicheng Liu
CVPR 2023
[PDF] [Project page]LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
Linjie Li, Zhe Gan, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Ce Liu, Lijuan Wang
CVPR 2023
[PDF] [Code]Cross-modal Representation Learning for Zero-shot Action Recognition
Chung-Ching Lin, Kevin Lin, Linjie Li, Lijuan Wang, Zicheng Liu
CVPR 2022
[PDF]SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Kevin Lin*, Linjie Li*, Chung-Ching Lin*, Faisal Ahmed, Zhe Gan, Zicheng Liu, Yumao Lu, Lijuan Wang
CVPR 2022
[PDF] [Code]VA-RED2 : Video Adaptive Redundancy Reduction
Bowen Pan, Rameswar Panda, Camilo Fosco, Chung-Ching Lin, Alex Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris
ICLR 2021
[PDF] [Project page]AdaFuse: Adaptive temporal fusion network for efficient action recognition
Yue Meng, Rameswar Panda, Chung-Ching Lin, Prasanna Sattigeri, Leonid Karlinsky, Kate Saenko, Aude Oliva, Rogerio Feris
ICLR 2021
[PDF] [Code] [Project page]AR-Net: Adaptive frame resolution for efficient action recognition
Yue Meng, Chung-Ching Lin, Rameswar Panda, Prasanna Sattigeri, Leonid Karlinsky, Aude Oliva, Kate Saenko, Rogerio Feris
ECCV 2020
[PDF] [Code] [Project page]Video instance segmentation tracking with a modified VAE architecture
Chung-Ching Lin, Ying Hung, Rogerio Feris, Linglin He
CVPR 2020
[PDF]A prior-less method for multi-face tracking in unconstrained videos
Chung-Ching Lin, Ying Hung
CVPR 2018
[PDF]Adaptive as-natural-as-possible image stitching
Chung-Ching Lin, Sharathchandra U Pankanti, Karthikeyan Natesan Ramamurthy, Aleksandr Y Aravkin
CVPR 2015
[PDF]
Workshop & Tutorial
CVPR 2022 tutorial on "Recent Advances in Vision-and-Language Pre-training"
ICCV 2019 workshop on "Moving Cameras: From Body Cameras to Drones"
US Patent
9,400,939 10,204,291 10,255,674 10,217,225 10,386,409 10,553,005 10,755,397 10,755,404 11,172,225, 11,282,249, 11,954,910, 12,205,306