Kevin Lin 林可昀
I am a Senior Researcher at Microsoft GenAI, working on multimodal understanding and generation. I have broad research interests in computer vision, vision-and-language, and related problems.
I received my Ph.D. in Electrical Engineering from the University of Washington in 2020, and an M.S. degree from National Taiwan University in 2014.
Email: keli[at]microsoft.com or kvlin[at]uw.edu
Google Scholar / Github / Microsoft profile / Outdated CV
News
2024/02: DisCo and MM-Narrator were accepted at CVPR 2024.
2024/01: LRV was accepted at ICLR 2024.
2023/11: MM-Narrator: narrating long-form videos with GPT-4, memory mechanism and in-conext learning.
2023/11: MM-Navigator: a GPT-4V-based agent for smartphone GUI navigation.
2023/10: MM-VID: exploring GPT-4V for many interesting video tasks.
2023/10: DEsignBench: exploring and benchmarking DALL-E 3 for visual design.
2023/10: Idea2Img: automatic image design and generation.
2023/10: the Dawn of LMMs: exploring GPT-4V's capabilities and practical usages. [Link to OpenAI]
2023/10: Mesh pre-training was accepted at WACV 2024.
2023/10: Multimodal model merging was accepted at EMNLP 2023 Findings.
2023/09: MetaEX-GAN was accepted at TASLP.
2023/08: MM-Vet: a new benchmark for evaluating VL integrated capabilities of LMM.
2023/07: EQBen was accepted at ICCV 2023
2023/07: DisCo: a new diffusion model for human dance generation
2023/06: LRV-Instruction: a new dataset for robust instruction tuning.
2023/03: MM-REACT: prompting ChatGPT for multimodal reasoning and action.
2023/02: 5 papers accepted at CVPR 2023
2022/06: CVPR tutorial on recent advances in vision-language pre-training
2022/06: Florence-GIT: a new generative VL foundation model achieving new SOTA on many VL benchmarks.
2022/03: 2 papers accepted at CVPR 2022
2021/12: OVIS was accepted at AAAI 2022
2021/07: Mesh Graphormer was accepted at ICCV 2021
2021/03: METRO was accepted at CVPR 2021
Professional Experience
2020 - Now
Senior Researcher @ Microsoft
Manager: Lijuan Wang2016 - 2020
Research/Teaching Assistant @ UW ECE
Advisor: Ming-Ting SunSpring 18, Spring 19, Summer 2019
Research Intern @ Microsoft Research
Advisor: Lijuan Wang and Zicheng LiuSummer 2018
Research Intern @ NVIDIA Research
Advisor: Ming-Yu LiuSummer 2017
Research Intern @ eBay Research Lab
Advisor: Fan Yang and Robinson PiramuthuSummer 2015
Research Engineer @ ADSC Singapore
Advisor: Jiwen LuSummer 2014
Software Engineering Intern @ Yahoo Taipei
Advisor: Jenhao Hsiao2014 - 2016
Research Assistant @ Academia Sinica IIS
Advisor: Chu-Song Chen2012 - 2014
Research Assistant @ NTU imLab
Advisor: Yi-Ping Hung and Chu-Song Chen2008 - 2012
Undergrad Student @ NTUST ECE
Professional Service
Awarded as Top/Outstanding Reviewer
− CVPR 2021, ECCV 2020, NeurIPS 2019Tutorial Organizer/Speaker
− 2022: CVPR Tutorial on Recent Advances in Vision-and-Language Pre-training
− 2020: Open Data Science Conference Tutorial on Recent Advances in Image CaptioningConference Reviewer
− CVPR, ICCV, ECCV, WACV; NeurIPS, ICLR, ICML, AAAI; ACL, NAACL, EMNLP, etc.Journal Reviewer
− TPAMI, IJCV, TNNLS, TVCG, TCSVT, TIP, CVIU, Pattern Recognition, Signal Processing Letter, Multimedia System, APSIPA Transactions on Signal and Information Processing, IPSJ Transactions on Computer Vision and Application, etc.Teaching Assistant at UW ECE
- EE 568: Digital Image Processing (2020)
- EEPMP 586: Digital Video Coding Systems (2020)
- EE 440: Introduction to Digital Imaging Systems (2019)
- EE 341: Discrete Time Linear Systems (2019)Teaching Assisant at NTU CSIE & GINM
- CSIE 5079: Pattern Recognition and Classification (2014)
Interns/Students
Chaoyi Zhang (University of Sydney), Summer 2023
Yuanhao Zhai (SUNY Buffalo), Summer 2023
Tan Wang (Nanyang Technological University), Summer 2022, Spring 2023
Fuxiao Liu (University of Maryland, College Park), Spring 2023
Qing-Wen Yang (National Tsing Hua University), Spring 2023
Yun-Yen Chuang (National Taiwan University), Autumn 2022
Yi-Lin Sung (UNC Chapel Hill), Summer 2022
Lin Huang (SUNY Buffalo), Summer 2022
Tsu-Jui (Ray) Fu (UCSB), Summer 2021
Sheng Liu (SUNY Buffalo), Summer 2020
Selected Preprint
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
An Yan*, Zhengyuan Yang*, Wanrong Zhu, Kevin Lin, Linjie Li, Jianfeng Wang, Jianwei Yang, Yiwu Zhong, Julian McAuley, Jianfeng Gao, Zicheng Liu, Lijuan Wang
arXiv preprint arXiv:2311.07562
[PDF] [code]MM-VID: Advancing Video Understanding with GPT-4V(ision)
Kevin Lin*, Faisal Ahmed*, Linjie Li*, Chung-Ching Lin*, Ehsan Azarnasab, Zhengyuan Yang, Jianfeng Wang, Lin Liang, Zicheng Liu, Yumao Lu, Ce Liu, Lijuan Wang
arXiv preprint arXiv:2310.19773
[PDF] [Project Page]DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design
Kevin Lin*, Zhengyuan Yang*, Linjie Li, Jianfeng Wang, Lijuan Wang
arXiv preprint arXiv:2310:15144
[PDF] [Project Page]Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation
Zhengyuan Yang, Jianfeng Wang, Linjie Li, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang
arXiv preprint arXiv:2310:08541
[PDF] [Project Page] [Video]The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
Zhengyuan Yang*, Linjie Li*, Kevin Lin*, Jianfeng Wang*, Chung-Ching Lin*, Zicheng Liu, Lijuan Wang
arXiv preprint arXiv:2309:17421
[PDF] [Link to OpenAI]MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
Weihao Yu*, Zhengyuan Yang*, Linjie Li, Jianfeng Wang, Kevin Lin, Zicheng Liu, Xinchao Wang, Lijuan Wang
arXiv preprint arXiv:2308.02490
[PDF] [code]MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
Zhengyuan Yang*, Linjie Li*, Jianfeng Wang*, Kevin Lin*, Ehsan Azarnasab*, Faisal Ahmed*, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang
arXiv preprint arXiv:2303.11381
[PDF] [project page]
Conference Paper
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Chaoyi Zhang, Kevin Lin, Zhengyuan Yang, Jianfeng Wang, Linjie Li, Chung-Ching Lin, Zicheng Liu, Lijuan Wang
CVPR 2024 (Highlight)
[PDF] [Project Page]DisCo: Disentangled Control for Realistic Human Dance Generation
Tan Wang*, Linjie Li*, Kevin Lin*, Yuanhao Zhai, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang
CVPR 2024
[PDF] [project page]Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Fuxiao Liu, Kevin Lin, Linjie Li, Jianfeng Wang, Yaser Yacoob, Lijuan Wang
ICLR 2024
[PDF] [project page]MPT: Mesh Pre-Training with Transformers for Human Pose and Mesh Reconstruction
Kevin Lin, Chung-Ching Lin, Lin Liang, Zicheng Liu, Lijuan Wang
WACV 2024
[PDF]An Empirical Study of Multimodal Model Merging
Yi-Lin Sung, Linjie Li, Kevin Lin, Zhe Gan, Mohit Bansal, Lijuan Wang
EMNLP 2023 Findings
[PDF] [code]Equivariant Similarity for Vision-Language Foundation Models
Tan Wang, Kevin Lin, Linjie Li, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang
ICCV 2023 (Oral)
[PDF] [code]ReCo: Region-Controlled Text-to-Image Generation
Zhengyuan Yang, Jianfeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang
CVPR 2023
[PDF] [code]An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Tsu-Jui Fu*, Linjie Li*, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu
CVPR 2023
[PDF] [code]LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
Linjie Li, Zhe Gan, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Ce Liu, Lijuan Wang
CVPR 2023
[PDF] [code]Neural Voting Field for Camera-Space 3D Hand Pose Estimation
Lin Huang, Chung-Ching Lin, Kevin Lin, Lin Liang, Lijuan Wang, Junsong Yuan, Zicheng Liu
CVPR 2023
[PDF][project page]Adaptive Human Matting for Dynamic Videos
Chung-Ching Lin, Jiang Wang, Kun Luo, Kevin Lin, Linjie Li, Lijuan Wang, Zicheng Liu
CVPR 2023
[PDF] [code]SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Kevin Lin*, Linjie Li*, Chung-Ching Lin*, Faisal Ahmed, Zhe Gan, Zicheng Liu, Yumao Lu, Lijuan Wang
CVPR 2022
[PDF] [code]Cross-modal Representation Learning for Zero-shot Action Recognition
Chung-Ching Lin, Kevin Lin, Linjie Li, Lijuan Wang, Zicheng Liu
CVPR 2022
[PDF]OVIS: Open-Vocabulary Visual Instance Search via Visual-Semantic Aligned Representation Learning
Sheng Liu, Kevin Lin, Lijuan Wang, Junsong Yuan, Zicheng Liu
AAAI 2022
[PDF]Mesh Graphormer
Kevin Lin, Lijuan Wang, Zicheng Liu
ICCV 2021
[PDF] [code]End-to-End Human Pose and Mesh Reconstruction with Transformers
Kevin Lin, Lijuan Wang, Zicheng Liu
CVPR 2021
[PDF] [code]VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
Xiaowei Hu, Xi Yin, Kevin Lin, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu
AAAI 2021
[PDF] [Media] [Blog]Nonparametric Human Mesh Reconstruction from a Single Image without Ground Truth Meshes
Kevin Lin, Lijuan Wang, Ying Jin, Zicheng Liu, Ming-Ting Sun
ICIP 2021
[PDF]Adversarial Learning for Fine-Grained Image Search
Kevin Lin, Fan Yang, Qiaosong Wang, Robinson Piramuthu
ICME 2019 (Oral)
[PDF] [slide] [dataset]Adversarial Ranking for Language Generation
Kevin Lin*, Dianqi Li*, Xiaodong He, Zhengyou Zhang, Ming-Ting Sun
NeurIPS 2017
[PDF] [poster] [code]Learning Compact Binary Descriptors with Unsupervised Deep Neural Networks
Kevin Lin, Jiwen Lu, Chu-Song Chen, Jie Zhou
CVPR 2016
[PDF] [code]Cross-Batch Reference Learning for Deep Classification and Retrieval
Huei-Fang Yang, Kevin Lin, Chu-Song Chen
ACM MM 2016
[PDF] [arXiv] [code]Rapid Clothing Retrieval via Deep Learning of Binary Codes and Hierarchical Search
Kevin Lin, Huei-Fang Yang, Kuan-Hsien Liu, Jen-Hao Hsiao, Chu-Song Chen
ICMR 2015
[PDF]Flower Classification with Few Training Examples via Recalling Visual Patterns from Deep CNN
Kevin Lin, Huei-Fang Yang, Chu-Song Chen
CVGIP 2015
[PDF]Location-Aware Object Detection via Coherent Region Grouping
Shen-Chi Chen, Kevin Lin, Chu-Song Chen, Yi-Ping Hung
ICASSP 2015 (Oral)
[PDF]Left-Luggage Detection from Finite-State-Machine Analysis in Static-Camera Videos
Kevin Lin, Shen-Chi Chen, Chu-Song Chen, Daw-Tung Lin, Yi-Ping Hung
ICPR 2014
[PDF]
Journal Article
MetaEx-GAN: Meta Exploration to Improve Natural Language Generation via Generative Adversarial Networks
Yun-Yen Chuang, Hung-Min Hsu, Kevin Lin, Ray-I Chang, Hung-Yi Lee
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2023
[coming soon]GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu and Lijuan Wang
Transactions on Machine Learning Research (TMLR), 2022
[PDF]Multimodal Graph Neural Network for Video Procedural Captioning
Lei Ji*, Rongcheng Tu*, Kevin Lin, Lijuan Wang, Nan Duan
Neurocomputing, 2022
[PDF]Cross-Domain Complementary Learning Using Pose for Multi-Person Part Segmentation
Kevin Lin, Lijuan Wang, Kun Luo, Yinpeng Chen, Zicheng Liu, Ming-Ting Sun
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2021
[PDF] [code] [video]Cross-Batch Reference Learning for Deep Retrieval
Huei-Fang Yang, Kevin Lin, Ting-Yen Chen, Chu-Song Chen
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2020
[PDF]Unsupervised Deep Learning of Compact Binary Descriptors
Kevin Lin, Jiwen Lu, Chu-Song Chen, Jie Zhou, Ming-Ting Sun
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019
[PDF]Supervised Learning of Semantics-Preserving Hash via Deep Convolutional Neural Networks
Huei-Fang Yang, Kevin Lin, Chu-Song Chen
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2018
[PDF] [code]Abandoned Object Detection via Temporal Consistency Modeling and Back-Tracing Verification for Visual Surveillance
Kevin Lin, Shen-Chi Chen, Chu-Song Chen, Daw-Tung Lin, Yi-Ping Hung
IEEE Transactions on Information Forensic and Security (TIFS), 2015
[PDF] [code] [dataset]
Workshop Paper
Learning to Generate Multiple Style Transfer Outputs for an Input Sentence
Kevin Lin, Ming-Yu Liu, Ming-Ting Sun, Jan Kautz
ACL Workshop on Neural Generation and Translation, 2020
[PDF]Deep Learning of Binary Hash Codes for Fast Image Retrieval
Kevin Lin, Huei-Fang Yang, Jen-Hao Hsiao, Chu-Song Chen
CVPR Workshop on Deep Learning in Computer Vision, 2015
[PDF] [code] [FAQ] [slide]
Demo and Poster
Cross-Domain Complementary Learning with Synthetic Data for Multi-Person Part Segmentation
Kevin Lin, Lijuan Wang, Kun Luo, Yinpeng Chen, Zicheng Liu, Ming-Ting Sun
ICCV Demo, 2019
[PDF] [video] [code] [poster]Teleport: Space Navigation by Detecting the Self-motion of a Mobile Device
Shen-Chi Chen, Chia-Wei Hsu, Shih-Yao Lin, Kevin Lin, Yi-Ping Hung
ACM SIGGRAPH Asia Posters, 2013
[PDF]