Jeong Hun Yeo

Postdoctoral Researcher

Integrated Vision & Language Lab.

Korea Advanced Institute of Science and Technology (KAIST)

e-mail: sedne246@kaist.ac.kr / [Google Scholar] / [LinkedIn]

Work Experience

KAIST , Daejeon, South Korea (Mar. 2026 - Present)

Postdoctoral Researcher, Supported by the Jang Young Sil Fellowship, focusing on advanced research in multimodal AI

Education

Korea Advanced Institute of Science and Technology (KAIST), South Korea (2022 - 2026)

Ph.D in Electrical Engineering (advisor: Prof. Yong Man Ro)

Korea Advanced Institute of Science and Technology (KAIST), South Korea (2020 - 2022)

M.S in Electrical Engineering (advisor: Prof. Yong Man Ro)

Korea Advanced Institute of Science and Technology (KAIST), South Korea (2014 - 2020)

B.S in Electrical Engineering

Publications

International Conference

1. Emotion-Coherent Reasoning for Multimodal LLMs via Emotional Rationale Verifier

Hyeongseop Rha, Jeong Hun Yeo, Yeonju Kim, Yong Man Ro

The Association for the Advancement of Artificial Intelligence (AAAI), 2026 [Paper][Code]

2. Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations

Jeong Hun Yeo*, Minsu Kim*, Chae Won Kim, Stavros Petridis, Yong Man Ro (* Co-First Authors)

IEEE/CVF International Conference on Computer Vision (ICCV), 2025 [Paper][Code]

3. MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens

Jeong Hun Yeo*, Hyeongseop Rha*, Se jin Park, and Yong Man Ro (* Co-First Authors)

Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL Findings), 2025, [Paper][Code]

4. Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language

Jeong Hun Yeo, Chae Won Kim, Hyunjun Kim, Hyeongseop Rha, Seunghee Han, Wen-Huang Cheng, Yong Man Ro

The Association for the Advancement of Artificial Intelligence (AAAI) 2025 [Paper][Code]

5. Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

Jeong Hun Yeo*, Seunghee Han*, Minsu Kim, and Yong Man Ro (* Co-First Authors)

Empirical Methods in Natural Language Processing (EMNLP) 2024 Findings, [Paper][Code]

6. Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation

Minsu Kim*, Jeong Hun Yeo*, Se Jin Park, Hyeongseop Rha, and Yong Man Ro (* Co-First Authors)

The Association for Computing Machinery's Annual Conference on Multimedia, (ACMMM), 2024, [Paper][Code]

7. Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

Se Jin Park*, Chae Won Kim*, Hyeongseop Rha, Minsu Kim, Joanna Hong, Jeong Hun Yeo, and Yong Man Ro

Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) Oral Presentation, 2024, [Paper | Data | Demo]

8. Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper

Jeong Hun Yeo*, Minsu Kim*, Shinji Watanabe, and Yong Man Ro (* Co-First Authors)

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 [Paper][Code]

9. Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens

Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, and Yong Man Ro

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 [Paper][Code]

10. Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge

Minsu Kim*, Jeong Hun Yeo*, Jeongsoo Choi, and Yong Man Ro (* Co-First Authors)

IEEE/CVF International Conference on Computer Vision (ICCV), 2023 [Paper][Code]

11. Multi-Temporal Lip-Audio Memory for Visual Speech Recognition

Jeong Hun Yeo, Minsu Kim, and Yong Man Ro

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 [Paper]

12. Distinguishing Homophenes using Multi-head Visual-audio Memory for Lip Reading

Minsu Kim, Jeong Hun Yeo, and Yong Man Ro

AAAI Conference on Artificial Intelligence (AAAI), 2022 [Paper] [Code]

International Journal

1. GCAgent: Long-Video Understanding via Schematic and Narrative Episodic Memory

Jeong Hun Yeo*, Sangyun Chung*, Sungjune Park, Dae Hoe Kim, Jinyoung Moon, and Yong Man Ro (* Co-First Authors)

IEEE Transactions on Multimedia (TMM), 2026 [Paper]

2. AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model

Jeong Hun Yeo, Minsu Kim, Jeongsoo Choi, Dae Hoe Kim, and Yong Man Ro

IEEE Transactions on Multimedia (TMM), 2024 [Paper]

Preprints

1. Diffusion Large Language Models for Visual Speech Recognition

Jeong Hun Yeo, Chae Won Kim, Hyeongseop Rha, and Yong Man Ro

arXiv preprint arXiv:2605.28456 [Paper] [Code]

2. Decoding Strategies for Diffusion-Based ASR: A Systematic Evaluation of Confidence-Based Thresholding

Jeong Hun Yeo, Minsu Kim, Hyeongseop Rha, and Yong Man Ro

arXiv preprint arXiv:2605.29613 [Paper]

3. Learning What to Attend First: Modality-Importance-Guided Reasoning for Reliable Multimodal Emotion Understanding

Hyeongseop Rha*, Jeong Hun Yeo*, Junil Won, Se Jin Park, and Yong Man Ro (* Co-First Authors)

arXiv preprint arXiv:2512.02699 [Paper]

4. Towards Inclusive Communication: A Unified Framework for Generating Spoken Language from Sign, Lip, and Audio

Jeong Hun Yeo, Hyeongseop Rha, Sungjune Park, Junil Won, and Yong Man Ro

arXiv preprint arXiv:2508.20476 [Paper]

Awards and Honors

Jang Young Sil Postdoctoral Fellowship - KAIST, 2026
KAIST Outstanding Paper Award for Graduate Students, 2024

Professional Activities

Program Reviewer

Journal Reviewer:
- Transactions on Image Processing (TIP)
- Transactions on Audio, Speech and Language Processing (TASLP)
Conference Reviewer:
- Computer Vision: CVPR(2026), ICCV(2025), ECCV(2026)
- Artificial Intelligence: AAAI(2026), ARR (ACL Rolling Review, 2025, 2026)
- Signal Processing: ICASSP(2025, 2026), ICIP(2024, 2025)

Invited Talk

ETRI

Title: Efficient LLM-Based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens

Date: June 26, 2025

Teaching

EE837 Special Topics in Signal Processing: Multimedia Processing and Learning, KAIST (2023)

Teaching Assistant

Page updated

Google Sites

Report abuse