Minsu Kim
Postdoctoral Researcher at Meta
Work Experience
Meta,
London, UK (May. 2024 - Current)
Postdoctoral Researcher
KAIST,
Daejeon, Korea (Mar. 2024 - Mar. 2024)
Postdoctoral Researcher
Carnegie Mellon University (CMU),
Pittsburgh, USA (Aug. 2023 - Oct. 2023)
Visiting Scholar (WAV Lab.)
Mentor: Prof. Shinji Watanabe
Education
KAIST, Daejeon, Korea (Feb. 2019 - Feb. 2024)
Ph.D. in Electrical Engineering
Advisor: Prof. Yong Man Ro
Yonsei Univ., Seoul, Korea (Feb. 2013 - Feb. 2019)
B.S. in Electrical and Electronic Engineering
Graduated with High Honors / Early Graduation (1 year)
Publications
International Journal
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Jeong Hun Yeo, Minsu Kim, Jeongsoo Choi, Dae Hoe Kim, and Yong Man Ro
IEEE Transactions on Multimedia (TMM), 2024 [Paper]
Cromm-vsr: Cross-modal memory augmented visual speech recognition
Minsu Kim, Joanna Hong, Se Jin Park, and Yong Man Ro
IEEE Transactions on Multimedia (TMM), 2021 [Paper] [Code]
Speech Reconstruction with Reminiscent Sound via Visual Voice Memory
Joanna Hong, Minsu Kim, Se Jin Park, and Yong Man Ro
IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 2021 [Paper]
International Conference
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
Jeongsoo Choi*, Se Jin Park*, Minsu Kim*, and Yong Man Ro (* Co-First Authors)
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. (Highlight) [Paper] [Code] [Demo]
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, and Yong Man Ro
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 [Paper] [Code] [Demo]
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper
Jeong Hun Yeo*, Minsu Kim*, Shinji Watanabe, and Yong Man Ro (* Co-First Authors)
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 [Paper] [Data]
Text-driven Talking Face Synthesis by Reprogramming Audio-driven Models
Jeongsoo Choi, Minsu Kim, Se Jin Park, and Yong Man Ro
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 [Paper]
Exploring Phonetic Context-Aware Lip-Sync for Talking Face Generation
Se Jin Park, Minsu Kim, Jeongsoo Choi, and Yong Man Ro
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 [Paper]
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
Minsu Kim*, Jeong Hun Yeo*, Jeongsoo Choi, and Yong Man Ro (* Co-First Authors)
IEEE/CVF International Conference on Computer Vision (ICCV), 2023 [Paper]
Intelligible Lip-to-speech Synthesis with Speech Units
Jeongsoo Choi, Minsu Kim, and Yong Man Ro
24th INTERSPEECH Conference, 2023 [Paper] [Code] [Demo]
Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring
Joanna Hong*, Minsu Kim*, Jeongsoo Choi, and Yong Man Ro (* Co-First Authors)
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023 [Paper] [Code] [Data]
Lip-to-speech Synthesis in the Wild with Multi-task Learning
Minsu Kim*, Joanna Hong*, and Yong Man Ro (* Co-First Authors)
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 [Paper] [Code]
Multi-Temporal Lip-Audio Memory for Visual Speech Recognition
Jeong Hun Yeo, Minsu Kim, and Yong Man Ro
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 [Paper]
Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video
Minsu Kim, Chae Won Kim, and Yong Man Ro
AAAI Conference on Artificial Intelligence (AAAI), 2023 [Paper]
Speaker-adaptive Lip Reading with User-dependent Padding
Minsu Kim, Hyunjun Kim, and Yong Man Ro
European conference on computer vision (ECCV), 2022 [Paper] [Data] [Code]
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection
Joanna Hong, Minsu Kim, and Yong Man Ro
European conference on computer vision (ECCV), 2022 [Paper]
Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
Joanna Hong*, Minsu Kim*, Daehun Yoo, and Yong Man Ro (* Co-First Authors)
23rd INTERSPEECH Conference, 2022 [Paper] [Code]
Distinguishing Homophenes using Multi-head Visual-audio Memory for Lip Reading
Minsu Kim, Jeong Hun Yeo, and Yong Man Ro
AAAI Conference on Artificial Intelligence (AAAI), 2022 [Paper] [Code]
SyncTalkFace: Talking Face Generation with precise Lip-syncing via Audio-Lip Memory
Se Jin Park, Minsu Kim, Joanna Hong, Jeongsoo Choi, and Yong Man Ro
AAAI Conference on Artificial Intelligence (AAAI), 2022 [Paper]
Lip to Speech Synthesis with Visual Context Attentional GAN
Minsu Kim, Joanna Hong, and Yong Man Ro
Advances in Neural Information Processing Systems (NeurIPS), 2021 [Paper] [Code]
Multi-modality associative bridging through memory: Speech sound recollected from face video
Minsu Kim*, Joanna Hong*, Se Jin Park, and Yong Man Ro (* Co-First Authors)
IEEE/CVF International Conference on Computer Vision (ICCV), 2021 [Paper] [Code]
Interpretation of Lesional Detection via Counterfactual Generation
Junho Kim, Minsu Kim, and Yong Man Ro
IEEE International Conference on Image Processing (ICIP), 2021 [Paper]
Unsupervised Disentangling of Viewpoint and Residues Variations by Substituting Representations for Robust Face Recognition
Minsu Kim, Joanna Hong, Junho Kim, Hong Joo Lee, and Yong Man Ro
International Conference on Pattern Recognition (ICPR), 2020 [Paper]
Robust video facial authentication with unsupervised mode disentanglement
Minsu Kim, Hong Joo Lee, Sangmin Lee, and Yong Man Ro
IEEE International Conference on Image Processing (ICIP), 2020 [Paper]
Learning Style Correlation for Elaborate Few-Shot Classification
Junho Kim, Minsu Kim, Jung Uk Kim, Hong Joo Lee, Sangmin Lee, Joanna Hong, and Yong Man Ro
IEEE International Conference on Image Processing (ICIP), 2020 [Paper]
Awards & Honors
Ph.D. Dissertation Award (2024)
KAIST College of Engineering
Kim Choong-Ki Award: Best Research Achievement Award (2023)
The School of Electrical Engineering of KAIST
KI Outstanding Researcher Award (2021)
KAIST INSTITUTES (KAIST)
Outstanding Teaching Assistant Award (2021)
Korea Advanced Institute of Science and Technology (KAIST)
Professional Activities
Invited Talk
KHU
Title: Efficient Multi-modal Processing via Tokenization
Date: April 5, 2024
CMU - Speech (Sphinx) Lunch
Title: Solving problems of a single-modal task with multi-modality
Date: August 31, 2023
Program Committee & Reviewer
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
Reviewer
Speech Communication
Reviewer
ICCV, ECCV, CVPR, NeurIPS, ICLR, AAAI, EMNLP, ICASSP, SIGKDD
Reviewer
Teaching
EE837 Special Topics in Signal Processing: Multimedia Processing and Learning, KAIST (2022)
Teaching Assistant
EE548 Matrix Computations for Signal Processing, KAIST (2021)
Teaching Assistant
EE474 Introduction to Multimedia, KAIST (2020, 2021, 2022, 2023)
Teaching Assistant
EE636 Digital Video Processing, KAIST (2020)
Teaching Assistant
CoE202 Basics of Artificial Intelligence, KAIST (2019)
Teaching Assistant