About me

Hyeongseop Rha

Hello, my name is Hyeogseop Rha, Ph.D. candidate in School of Electrical and Electronic Engineering at Korea Advanced Institute of Science and Technology (KAIST) under the supervision of Prof. Yong Man Ro.

Integrated Vision Language Lab (IVYLab & IVLLab )

Google Scholar | LinkedIn

Email: ryool_1832@kaist.ac.kr

Research Interests

Multi-Modal Large Language Model
Human Multi-Modal (Speech, Lip movement, Emotion ... )

Education

Ph. D. Candidate (Mar 2023 - Present)

Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST)

Daejeon, Korea

- Advisor : Prof. Yong Man Ro

B.S. (Feb 2017 - Feb 2023)

Electrical and Electronic Engineering, Yonsei University

Seoul, Korea

Publications

2025

MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens

Jeonghun Yeo*, Hyeongseop Rha*, Se Jin Park and Yong Man Ro

Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL Findings), 2025, [paper][Code]

Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language

Jeonghun Yeo, Chae Won Kim, Hyunjun Kim, Hyeongseop Rha, Seunghee Han, Wen-Huang Cheng, and Yong Man Ro

The association for the Advancement of Artifical Intelligence (AAAI) 2025 [paper][Code]

AV-EmoDialog: Chat with Audio-Visual Users Leveraging Emotional Cues

Se Jin Park, Yeonju Kim, Hyeongseop Rha, and Yong Man Ro

Arxiv Preprint [paper]

2024

Efficient Multilingual Visual Speech Recognition by Modeling Discretized Visual Speech Units

Minsu Kim*, Jeonghun Yeo*, Se Jin Park, Hyeongseop Rha, and Yong Man Ro

The Association for Computing Machinery's Annual Conference on Multimedia, (ACMMM), 2024, [paper]

Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

Se Jin Park*, Chae Won Kim*, Hyeongseop Rha, Minsu Kim, Joanna Hong, Jeonghun Yeo, and Yong Man Ro

Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) Oral, 2024 [paper]

Tmt: Tri-modal translation between speech, image, and text by processing different modalities as different languages

Minsu Kim, Jee-weon Jung, Hyeongseop Rha, Soumi Maiti, Siddhant Arora, Xuankai Chang, Shinji Watanabe, Yong Man Ro

IEEE Transactions on Multimedia , [paper]

Page updated

Google Sites

Report abuse