Jeong Hun Yeo
Jeong Hun Yeo
Ph.D. Candidate
Integrated Vision & Language Lab.
Korea Advanced Institute of Science and Technology (KAIST)
e-mail: sedne246@kaist.ac.kr / [Google Scholar] / [LinkedIn]
Education
Korea Advanced Institute of Science and Technology (KAIST), South Korea (2022 - Present)
Ph.D in Electrical Engineering (advisor: Prof. Yong Man Ro)
Korea Advanced Institute of Science and Technology (KAIST), South Korea (2020 - 2022)
M.S in Electrical Engineering (advisor: Prof. Yong Man Ro)
Korea Advanced Institute of Science and Technology (KAIST), South Korea (2014 - 2020)
B.S in Electrical Engineering
Publications
Preprints
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Jeong Hun Yeo, Chae Won Kim, Hyunjun Kim, Hyeongseop Rha, Seunghee Han, Wen-Huang Cheng, Yong Man Ro
arXiv 2024, Under review [Paper][Code]
International Journal
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Jeong Hun Yeo, Minsu Kim, Jeongsoo Choi, Dae Hoe Kim, and Yong Man Ro
IEEE Transactions on Multimedia (TMM), 2024 [Paper]
International Conference
1. Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing
Jeong Hun Yeo*, Seunghee Han*, Minsu Kim, and Yong Man Ro (* Co-First Authors)
Empirical Methods in Natural Language Processing (EMNLP) 2024 Findings, [Paper][Code]
2. Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation
Minsu Kim*, Jeong Hun Yeo*, Se Jin Park, Hyeongseop Rha, and Yong Man Ro (* Co-First Authors)
The Association for Computing Machinery's Annual Conference on Multimedia, (ACMMM), 2024, [Paper][Code]
3. Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Se Jin Park*, Chae Won Kim*, Hyeongseop Rha, Minsu Kim, Joanna Hong, Jeong Hun Yeo, and Yong Man Ro
Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) Oral Presentation, 2024, [Paper | Data | Demo]
4. Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper
Jeong Hun Yeo*, Minsu Kim*, Shinji Watanabe, and Yong Man Ro (* Co-First Authors)
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 [Paper][Code]
5. Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, and Yong Man Ro
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 [Paper][Code]
6. Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
Minsu Kim*, Jeong Hun Yeo*, Jeongsoo Choi, and Yong Man Ro (* Co-First Authors)
IEEE/CVF International Conference on Computer Vision (ICCV), 2023 [Paper][Code]
7. Multi-Temporal Lip-Audio Memory for Visual Speech Recognition
Jeong Hun Yeo, Minsu Kim, and Yong Man Ro
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 [Paper]
8. Distinguishing Homophenes using Multi-head Visual-audio Memory for Lip Reading
Minsu Kim, Jeong Hun Yeo, and Yong Man Ro
AAAI Conference on Artificial Intelligence (AAAI), 2022 [Paper] [Code]
Professional Activities
Teaching
EE837 Special Topics in Signal Processing: Multimedia Processing and Learning, KAIST (2023)
Teaching Assistant