Kevin Lin 林可昀
I am a Senior Researcher at Microsoft, working on multimodal understanding and generation.
I also work as part of Azure and OpenAI collaboration, focused on customizing and fine-tuning large OpenAI models.
I received my Ph.D. in Electrical Engineering from the University of Washington in 2020, and an M.S. degree from National Taiwan University in 2014.
Email: keli[at]microsoft.com or kvlin[at]uw.edu
Google Scholar / Github / Microsoft profile / Outdated CV
News
2024/09: MCM and Meta-DiffB were accepted at NeurIPS 2024.
2024/07: APTPose was accepted at BMVC 2024.
2024/07: SoM-LLaVA-1.5 was accepted at COLM 2024.
2024/07: Openleaf was accepted at ACMMM 2024.
2024/07: Idea2Image and IDOL were accepted at ECCV 2024.
2024/06: Check out our latest work on accelerating video diffusion: Motion Consistency Model; world model evaluation: MMWorld
2024/06: Check out our CVPR 2024 Tutorial on Recent Advances in Vision Foundation Models. Slides and recordings are now available.
2024/05: MM-Vet was accepted at ICML 2024.
2024/02: DisCo and MM-Narrator were accepted at CVPR 2024.
2024/01: LRV was accepted at ICLR 2024.
2023/11: MM-Narrator: narrating long-form videos with GPT-4, memory mechanism and in-conext learning.
2023/11: MM-Navigator: a GPT-4V-based agent for smartphone GUI navigation.
2023/10: MM-VID: exploring GPT-4V for many interesting video tasks.
2023/10: DEsignBench: exploring and benchmarking DALL-E 3 for visual design.
2023/10: Idea2Img: automatic image design and generation.
2023/10: the Dawn of LMMs: exploring GPT-4V's capabilities and practical usages. [Link to OpenAI]
2023/10: Mesh pre-training was accepted at WACV 2024.
2023/10: Multimodal model merging was accepted at EMNLP 2023 Findings.
2023/09: MetaEX-GAN was accepted at TASLP.
2023/08: MM-Vet: a new benchmark for evaluating VL integrated capabilities of LMM.
2023/07: EQBen was accepted at ICCV 2023
2023/07: DisCo: a new diffusion model for human dance generation
2023/06: LRV-Instruction: a new dataset for robust instruction tuning.
2023/03: MM-REACT: prompting ChatGPT for multimodal reasoning and action.
2023/02: ReCo, VIOLET v2, LAVENDER, NVF and AdaM were accepted at CVPR 2023
2022/06: CVPR tutorial on recent advances in vision-language pre-training
2022/06: Florence-GIT: a new generative VL foundation model achieving new SOTA on many VL benchmarks.
2021/12: OVIS was accepted at AAAI 2022
2021/07: Mesh Graphormer was accepted at ICCV 2021
2021/03: METRO was accepted at CVPR 2021
Experience
Microsoft
Senior Researcher; 2022 - Now
Applied Scientist; 2020 - 2022
Manager: Lijuan WangUW ECE
Research/Teaching Assistant; 2016 - 2020
Advisor: Ming-Ting SunMicrosoft Research
Research Intern; Spring 18, Spring 19, Summer 2019
Mentors: Lijuan Wang and Zicheng LiuNVIDIA Research
Research Intern; Summer 2018
Mentor: Ming-Yu LiueBay Research Lab
Research Intern; Summer 2017
Mentors: Fan Yang and Robinson Piramuthu
Academia Sinica IIS
Research Assistant; 2014 - 2016
Advisor: Chu-Song ChenADSC Singapore
Research Engineer; Summer 2015
Mentor: Jiwen LuYahoo Taipei
Software Engineering Intern; Summer 2014
Mentor: Jenhao HsiaoNTU imLab
Research Assistant; 2012 - 2014
Advisor: Yi-Ping Hung and Chu-Song Chen
Service
Awarded as Top/Outstanding Reviewer
− CVPR 2021, ECCV 2020, NeurIPS 2019Tutorial Organizer/Speaker
− 2024: CVPR Tutorial on Recent Advances in Vision Foundation Models
− 2024: ICME Tutorial on Recent Advances in Multimodal Foundation Models
− 2022: CVPR Tutorial on Recent Advances in Vision-and-Language Pre-training
− 2020: Open Data Science Conference Tutorial on Recent Advances in Image CaptioningConference Reviewer
− CVPR, ICCV, ECCV, WACV; NeurIPS, ICLR, ICML, AAAI; ACL, NAACL, EMNLP, etc.Journal Reviewer
− TPAMI, IJCV, TNNLS, TVCG, TCSVT, TIP, CVIU, Pattern Recognition, Signal Processing Letter, Multimedia System, APSIPA Transactions on Signal and Information Processing, IPSJ Transactions on Computer Vision and Application, etc.Teaching Assistant at UW ECE
- EE 568: Digital Image Processing (2020)
- EEPMP 586: Digital Video Coding Systems (2020)
- EE 440: Introduction to Digital Imaging Systems (2019)
- EE 341: Discrete Time Linear Systems (2019)Teaching Assisant at NTU CSIE & GINM
- CSIE 5079: Pattern Recognition and Classification (2014)
Interns/Students
Yan-Bo Lin (UNC Chapel Hill), Spring 2024, Summer 2024
Yuanhao Zhai (SUNY Buffalo), Summer 2023, Spring 2024
Chaoyi Zhang (University of Sydney), Summer 2023
Tan Wang (Nanyang Technological University), Summer 2022, Spring 2023
Fuxiao Liu (University of Maryland, College Park), Spring 2023
Qing-Wen Yang (National Tsing Hua University), Spring 2023
Yun-Yen Chuang (National Taiwan University), Autumn 2022
Yi-Lin Sung (UNC Chapel Hill), Summer 2022
Lin Huang (SUNY Buffalo), Summer 2022
Tsu-Jui (Ray) Fu (UCSB), Summer 2021
Sheng Liu (SUNY Buffalo), Summer 2020
Selected Preprint
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Xuehai He, Weixi Feng, Kaizhi Zheng, Yujie Lu, Wanrong Zhu, Jiachen Li, Yue Fan, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Kevin Lin, William Yang Wang, Lijuan Wang, Xin Eric Wang; arXiv preprint arXiv:2406.08407
[PDF][code][Project page]GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
An Yan*, Zhengyuan Yang*, Wanrong Zhu, Kevin Lin, Linjie Li, Jianfeng Wang, Jianwei Yang, Yiwu Zhong, Julian McAuley, Jianfeng Gao, Zicheng Liu, Lijuan Wang
arXiv preprint arXiv:2311.07562
[PDF] [code]MM-VID: Advancing Video Understanding with GPT-4V(ision)
Kevin Lin*, Faisal Ahmed*, Linjie Li*, Chung-Ching Lin*, Ehsan Azarnasab, Zhengyuan Yang, Jianfeng Wang, Lin Liang, Zicheng Liu, Yumao Lu, Ce Liu, Lijuan Wang
arXiv preprint arXiv:2310.19773
[PDF] [Project Page]DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design
Kevin Lin*, Zhengyuan Yang*, Linjie Li, Jianfeng Wang, Lijuan Wang
arXiv preprint arXiv:2310:15144
[PDF] [Project Page]The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
Zhengyuan Yang*, Linjie Li*, Kevin Lin*, Jianfeng Wang*, Chung-Ching Lin*, Zicheng Liu, Lijuan Wang
arXiv preprint arXiv:2309:17421
[PDF] [Link to OpenAI]MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
Zhengyuan Yang*, Linjie Li*, Jianfeng Wang*, Kevin Lin*, Ehsan Azarnasab*, Faisal Ahmed*, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang
arXiv preprint arXiv:2303.11381
[PDF] [project page]
Conference Paper
Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation
Yuanhao Zhai, Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Chung-Ching Lin, David Doermann, Junsong Yuan, Lijuan Wang
NeurIPS 2024
[PDF][code][Project page]Meta-Diffuβ: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration
Yunyen Chuang, Hung-Min Hsu, Kevin Lin, Chen-Sheng Gu, Ling Zhen Li, Ray-I Chang, Hung-yi Lee
NeurIPS 2024
[PDF] [code]APTPose: Anatomy-aware Pre-Training for 3D Human Pose Estimation
Qing-Wen Yang, Kai-Wen Duan, Ting-Yi Lu, Kevin Lin, Cheng-Yen Yang, Lijuan Wang, Jenq-Neng Hwang, Shang-Hong Lai
BMVC 2024
[coming soon]List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
An Yan, Zhengyuan Yang, Junda Wu, Wanrong Zhu, Jianwei Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Julian McAuley, Jianfeng Gao, Lijuan Wang
COLM 2024
[PDF][code]Openleaf: Open-domain interleaved image-text generation and evaluation
Jie An, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Zicheng Liu, Lijuan Wang, Jiebo Luo
ACM MM 2024 Brave New Ideas track
[PDF]IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
Yuanhao Zhai, Kevin Lin, Linjie Li, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, David Doermann, Junsong Yuan, Zicheng Liu, Lijuan Wang
ECCV 2024
[project page]Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation
Zhengyuan Yang, Jianfeng Wang, Linjie Li, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang
ECCV 2024
[PDF] [Project Page] [Video]MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
Weihao Yu*, Zhengyuan Yang*, Linjie Li, Jianfeng Wang, Kevin Lin, Zicheng Liu, Xinchao Wang, Lijuan Wang
ICML 2024
[PDF] [code]MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Chaoyi Zhang, Kevin Lin, Zhengyuan Yang, Jianfeng Wang, Linjie Li, Chung-Ching Lin, Zicheng Liu, Lijuan Wang
CVPR 2024 (Highlight)
[PDF] [Project Page]DisCo: Disentangled Control for Realistic Human Dance Generation
Tan Wang*, Linjie Li*, Kevin Lin*, Yuanhao Zhai, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang
CVPR 2024
[PDF] [project page]Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Fuxiao Liu, Kevin Lin, Linjie Li, Jianfeng Wang, Yaser Yacoob, Lijuan Wang
ICLR 2024
[PDF] [project page]MPT: Mesh Pre-Training with Transformers for Human Pose and Mesh Reconstruction
Kevin Lin, Chung-Ching Lin, Lin Liang, Zicheng Liu, Lijuan Wang
WACV 2024
[PDF]An Empirical Study of Multimodal Model Merging
Yi-Lin Sung, Linjie Li, Kevin Lin, Zhe Gan, Mohit Bansal, Lijuan Wang
EMNLP 2023 Findings
[PDF] [code]Equivariant Similarity for Vision-Language Foundation Models
Tan Wang, Kevin Lin, Linjie Li, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang
ICCV 2023 (Oral)
[PDF] [code]ReCo: Region-Controlled Text-to-Image Generation
Zhengyuan Yang, Jianfeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang
CVPR 2023
[PDF] [code]An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Tsu-Jui Fu*, Linjie Li*, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu
CVPR 2023
[PDF] [code]LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
Linjie Li, Zhe Gan, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Ce Liu, Lijuan Wang
CVPR 2023
[PDF] [code]Neural Voting Field for Camera-Space 3D Hand Pose Estimation
Lin Huang, Chung-Ching Lin, Kevin Lin, Lin Liang, Lijuan Wang, Junsong Yuan, Zicheng Liu
CVPR 2023
[PDF][project page]Adaptive Human Matting for Dynamic Videos
Chung-Ching Lin, Jiang Wang, Kun Luo, Kevin Lin, Linjie Li, Lijuan Wang, Zicheng Liu
CVPR 2023
[PDF] [code]SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Kevin Lin*, Linjie Li*, Chung-Ching Lin*, Faisal Ahmed, Zhe Gan, Zicheng Liu, Yumao Lu, Lijuan Wang
CVPR 2022
[PDF] [code]Cross-modal Representation Learning for Zero-shot Action Recognition
Chung-Ching Lin, Kevin Lin, Linjie Li, Lijuan Wang, Zicheng Liu
CVPR 2022
[PDF]OVIS: Open-Vocabulary Visual Instance Search via Visual-Semantic Aligned Representation Learning
Sheng Liu, Kevin Lin, Lijuan Wang, Junsong Yuan, Zicheng Liu
AAAI 2022
[PDF]Mesh Graphormer
Kevin Lin, Lijuan Wang, Zicheng Liu
ICCV 2021
[PDF] [code]End-to-End Human Pose and Mesh Reconstruction with Transformers
Kevin Lin, Lijuan Wang, Zicheng Liu
CVPR 2021
[PDF] [code]VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
Xiaowei Hu, Xi Yin, Kevin Lin, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu
AAAI 2021
[PDF] [Media] [Blog]Nonparametric Human Mesh Reconstruction from a Single Image without Ground Truth Meshes
Kevin Lin, Lijuan Wang, Ying Jin, Zicheng Liu, Ming-Ting Sun
ICIP 2021
[PDF]Adversarial Learning for Fine-Grained Image Search
Kevin Lin, Fan Yang, Qiaosong Wang, Robinson Piramuthu
ICME 2019 (Oral)
[PDF] [slide] [dataset]Adversarial Ranking for Language Generation
Kevin Lin*, Dianqi Li*, Xiaodong He, Zhengyou Zhang, Ming-Ting Sun
NeurIPS 2017
[PDF] [poster] [code]Learning Compact Binary Descriptors with Unsupervised Deep Neural Networks
Kevin Lin, Jiwen Lu, Chu-Song Chen, Jie Zhou
CVPR 2016
[PDF] [code]Cross-Batch Reference Learning for Deep Classification and Retrieval
Huei-Fang Yang, Kevin Lin, Chu-Song Chen
ACM MM 2016
[PDF] [arXiv] [code]Rapid Clothing Retrieval via Deep Learning of Binary Codes and Hierarchical Search
Kevin Lin, Huei-Fang Yang, Kuan-Hsien Liu, Jen-Hao Hsiao, Chu-Song Chen
ICMR 2015
[PDF]Flower Classification with Few Training Examples via Recalling Visual Patterns from Deep CNN
Kevin Lin, Huei-Fang Yang, Chu-Song Chen
CVGIP 2015
[PDF]Location-Aware Object Detection via Coherent Region Grouping
Shen-Chi Chen, Kevin Lin, Chu-Song Chen, Yi-Ping Hung
ICASSP 2015 (Oral)
[PDF]Left-Luggage Detection from Finite-State-Machine Analysis in Static-Camera Videos
Kevin Lin, Shen-Chi Chen, Chu-Song Chen, Daw-Tung Lin, Yi-Ping Hung
ICPR 2014
[PDF]
Journal Article
MetaEx-GAN: Meta Exploration to Improve Natural Language Generation via Generative Adversarial Networks
Yun-Yen Chuang, Hung-Min Hsu, Kevin Lin, Ray-I Chang, Hung-Yi Lee
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2023
[coming soon]GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu and Lijuan Wang
Transactions on Machine Learning Research (TMLR), 2022
[PDF]Multimodal Graph Neural Network for Video Procedural Captioning
Lei Ji*, Rongcheng Tu*, Kevin Lin, Lijuan Wang, Nan Duan
Neurocomputing, 2022
[PDF]Cross-Domain Complementary Learning Using Pose for Multi-Person Part Segmentation
Kevin Lin, Lijuan Wang, Kun Luo, Yinpeng Chen, Zicheng Liu, Ming-Ting Sun
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2021
[PDF] [code] [video]Cross-Batch Reference Learning for Deep Retrieval
Huei-Fang Yang, Kevin Lin, Ting-Yen Chen, Chu-Song Chen
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2020
[PDF]Unsupervised Deep Learning of Compact Binary Descriptors
Kevin Lin, Jiwen Lu, Chu-Song Chen, Jie Zhou, Ming-Ting Sun
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019
[PDF]Supervised Learning of Semantics-Preserving Hash via Deep Convolutional Neural Networks
Huei-Fang Yang, Kevin Lin, Chu-Song Chen
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2018
[PDF] [code]Abandoned Object Detection via Temporal Consistency Modeling and Back-Tracing Verification for Visual Surveillance
Kevin Lin, Shen-Chi Chen, Chu-Song Chen, Daw-Tung Lin, Yi-Ping Hung
IEEE Transactions on Information Forensic and Security (TIFS), 2015
[PDF] [code] [dataset]
Workshop Paper
Learning to Generate Multiple Style Transfer Outputs for an Input Sentence
Kevin Lin, Ming-Yu Liu, Ming-Ting Sun, Jan Kautz
ACL Workshop on Neural Generation and Translation, 2020
[PDF]Deep Learning of Binary Hash Codes for Fast Image Retrieval
Kevin Lin, Huei-Fang Yang, Jen-Hao Hsiao, Chu-Song Chen
CVPR Workshop on Deep Learning in Computer Vision, 2015
[PDF] [code] [FAQ] [slide]
Demo and Poster
Cross-Domain Complementary Learning with Synthetic Data for Multi-Person Part Segmentation
Kevin Lin, Lijuan Wang, Kun Luo, Yinpeng Chen, Zicheng Liu, Ming-Ting Sun
ICCV Demo, 2019
[PDF] [video] [code] [poster]Teleport: Space Navigation by Detecting the Self-motion of a Mobile Device
Shen-Chi Chen, Chia-Wei Hsu, Shih-Yao Lin, Kevin Lin, Yi-Ping Hung
ACM SIGGRAPH Asia Posters, 2013
[PDF]