About me
Hyeongseop Rha
Hello, my name is Hyeogseop Rha, Ph.D. candidate in School of Electrical and Electronic Engineering at Korea Advanced Institute of Science and Technology (KAIST) under the supervision of Prof. Yong Man Ro.
About me
Hyeongseop Rha
Hello, my name is Hyeogseop Rha, Ph.D. candidate in School of Electrical and Electronic Engineering at Korea Advanced Institute of Science and Technology (KAIST) under the supervision of Prof. Yong Man Ro.
Integrated Vision Language Lab (IVYLab & IVLLab )
Email: ryool_1832@kaist.ac.kr
Multi-Modal Large Language Model
Audio-Visual Speech Recognition
Ph. D. Candidate (Mar 2023 - Present)
Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST)
Daejeon, Korea
- Advisor : Prof. Yong Man Ro
B.S. (Feb 2017 - Feb 2023)
Electrical and Electronic Engineering, Yonsei University
Seoul, Korea
2025
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
Jeonghun Yeo*, Hyeongseop Rha*, Se Jin Park and Yong Man Ro
Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL Findings), 2025, [paper][Code]
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Jeonghun Yeo, Chae Won Kim, Hyunjun Kim, Hyeongseop Rha, Seunghee Han, Wen-Huang Cheng, and Yong Man Ro
The association for the Advancement of Artifical Intelligence (AAAI) 2025 [paper][Code]
AV-EmoDialog: Chat with Audio-Visual Users Leveraging Emotional Cues
Se Jin Park, Yeonju Kim, Hyeongseop Rha, and Yong Man Ro
Arxiv Preprint [paper]
2024
Efficient Multilingual Visual Speech Recognition by Modeling Discretized Visual Speech Units
Minsu Kim*, Jeonghun Yeo*, Se Jin Park, Hyeongseop Rha, and Yong Man Ro
The Association for Computing Machinery's Annual Conference on Multimedia, (ACMMM), 2024, [paper]
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Se Jin Park*, Chae Won Kim*, Hyeongseop Rha, Minsu Kim, Joanna Hong, Jeonghun Yeo, and Yong Man Ro
Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) Oral, 2024 [paper]
Tmt: Tri-modal translation between speech, image, and text by processing different modalities as different languages
Minsu Kim, Jee-weon Jung, Hyeongseop Rha, Soumi Maiti, Siddhant Arora, Xuankai Chang, Shinji Watanabe, Yong Man Ro