Ji-Hoon Kim
Ji-Hoon Kim
Ph.D. Student, Multimodal AI Lab,
Korea Advanced Institute of Science and Technology (KAIST)
I am a final-year Ph.D. student advised by Prof. Joon Son Chung at KAIST. My research focuses on audio-visual generative models, with particular interests in text-to-speech and video-to-speech.
jihoon [at] mm.kaist.ac.kr
Google Scholar | LinkedIn | CV (updated May 2025)
Work Experience
Carnegie Mellon University, Pittsburgh, USA
Visiting Scholar
Advisor: Shinji Watanabe
May 2025 - Present
42dot, Seoul, South Korea
Full-time Research Engineer
Sep 2022 - Feb 2023
Hyundai Motor Group, Seoul, South Korea
Full-time Research Engineer
Jan 2022 - Sep 2022
Education
KAIST, Daejeon, South Korea
Ph.D. in Electrical Engineering
Advisor: Joon Son Chung
Mar 2023 - Present
Korea University, Seoul, South Korea
M.S. in Artificial Intelligence
Advisor: Seong-Whan Lee
Sep 2019 - Feb 2022
Kyung Hee University, Seoul, South Korea
B.S. in Mathematics
Mar 2013 - Aug 2019
Publications
Journals
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Ji-Hoon Kim, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim, Joon Son Chung
IEEE Transactions on Audio, Speech, and Language Processing, 2025 [IF: 4.1]
[PDF] [Project Page]
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
Jee-weon Jung, Yihan Wu, Xin Wang, Ji-Hoon Kim, Soumi Maiti, Yuta Matsunaga, Hye-jin Shim, Jinchuan Tian,
Nicholas Evans, Joon Son Chung, Wangyou Zhang, Seyun Um, Shinnosuke Takamichi, Shinji Watanabe
IEEE Open Journal of Signal Processing, 2025 [IF: 2.9]
[PDF] [Project Page]
Conference Proceedings
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
Jeongsoo Choi, Ji-Hoon Kim, Kim Sung-Bin, Tae-Hyun Oh, Joon Son Chung
MM, 2025
[PDF] [Project Page]
Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment
Jeongsoo Choi*, Zhikang Niu*, Ji-Hoon Kim, Joon Son Chung, Chen Xie (* equal contribution)
INTERSPEECH, 2025
[PDF] [Project Page] [Code]
The Text-to-Speech in the Wild (TITW) Dataset
Jee-weon Jung, Wangyou Zhang, Soumi Maiti, Yihan Wu, Xin Wang, Ji-Hoon Kim, Yuta Matsunaga, Seyun Um,
Jinchuan Tian, Hye-jin Shim, Nicholas Evans, Joon Son Chung, Shinnosuke Takamichi, Shinji Watanabe
INTERSPEECH, 2025
[PDF] [Project Page]
InfiniteAudio: Infinite-Length Audio Generation with Consistency
Chaeyoung Jung, Hojoon Ki, Ji-Hoon Kim, Junmo Kim, Joon Son Chung
INTERSPEECH, 2025
[PDF] [Project Page]
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
Ji-Hoon Kim, Jeongsoo Choi, Jaehun Kim, Chaeyoung Jung, Joon Son Chung
CVPR, 2025 (Highlight presentation)
[PDF] [Project Page]
V2SFlow: Video-to-Speech Generation with Speech Decomposition and Recitified Flow
Jeongsoo Choi*, Ji-Hoon Kim*, Jinyu Li, Joon Son Chung, Shujie Liu (* equal contribution)
ICASSP, 2025
[PDF] [Project Page] [Code]
AdaptVC: High Quality Voice Conversion with Adaptive Learning
Jaehun Kim, Ji-Hoon Kim, Yeunju Choi, Tan Dat Nguyen, Seongkyu Mun, Joon Son Chung
ICASSP, 2025
[PDF] [Project Page]
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
Tan Dat Nguyen, Ji-Hoon Kim, Jeongsoo Choi, Shukjae Choi, Jinseok Park, Younglo Lee, Joon Son Chung
ICASSP, 2025
[PDF] [Project Page]
VoxSim: A Perceptual Voice Similarity Dataset
Junseok Ahn, Youkyum Kim, Yeunju Choi, Doyeop Kwak, Ji-Hoon Kim, Seongkyu Mun, Joon Son Chung
INTERSPEECH, 2024 (Best student paper finalist; Top 2%)
[PDF] [Project Page] [Code]
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
Chaeyoung Jung, Suyeon Lee, Ji-Hoon Kim, Joon Son Chung
INTERSPEECH, 2024
[PDF] [Code]
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Youngjoon Jang*, Ji-Hoon Kim*, Junseok Ahn, Doyeop Kwak, Hong-Sun Yang, Yoon-Cheol Ju,
Il-Hwan Kim, Byeong-Yeol Kim, Joon Son Chung (* equal contribution)
CVPR, 2024
[PDF] [Project Page]
FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder
Tan Dat Nguyen*, Ji-Hoon Kim*, Youngjoon Jang, Jaehun Kim, Joon Son Chung (* equal contribution)
ICASSP, 2024
[PDF] [Project Page] [Code]
Let There Be Sound: Reconstructing High Quality Speech from Silent Videos
Ji-Hoon Kim*, Jaehun Kim*, Joon Son Chung (* equal contribution)
AAAI, 2024 (Oral presentation)
Short Version at AV4D Workshop @ICCV2023
[PDF] [Project Page]
FACTSpeech: Speaking a Foreign Language Pronunciation Using Only Your Natrive Characters
Hong-Sun Yang, Ji-Hoon Kim, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim, Shuk-Jae Choi, Hyung-Yong Kim
INTERSPEECH, 2023
[PDF] [Project Page]
CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis
Ji-Hoon Kim, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim
ICASSP, 2023
[PDF] [Project Page]
TriniTTS: Pitch-controllable End-to-End TTS without External Aligner
Yoon-Cheol Ju, Il-Hwan Kim, Hong-Sun Yang, Ji-Hoon Kim, Byeong-Yeol Kim, Soumi Maiti, Shinji Watanabe
INTERSPEECH, 2022
[PDF] [Project Page]
Fre-GAN 2: Fast and Efficient Frequency-consistent Audio Synthesis
Sang-Hoon Lee, Ji-Hoon Kim, Kang-Eun Lee, Seong-Whan Lee
ICASSP, 2022
[PDF] [Project Page]
PVAE-TTS: Adaptive Text-to-Speech via Progressive Style Adaptation
Ji-Hyun Lee, Sang-Hoon Lee, Ji-Hoon Kim, Seong-Whan Lee
ICASSP, 2022
[PDF] [Project Page]
Voicemixer: Adversarial Voice Style Mixup
Sang-Hoon Lee, Ji-Hoon Kim, Hyunseung Chung, Seong-Whan Lee
NeurIPS, 2021
[PDF] [Project Page]
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints
Ji-Hoon Kim, Sang-Hoon Lee, Ji-Hyun Lee, Hong-Gyu Jung, Seong-Whan Lee
SMC, 2021
[PDF] [Project Page]
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
Ji-Hoon Kim, Sang-Hoon Lee, Ji-Hyun Lee, Seong-Whan Lee
INTERSPEECH, 2021
[PDF] [Project Page]
Multi-SpectroGAN: High-Diversity and High-Fidelity Spectrogram Generation with Adversarial Style Combination for Speech Synthesis
Sang-Hoon Lee, Hyun-Wook Yoon, Hyeong-Rae Noh, Ji-Hoon Kim, Seong-Whan Lee
AAAI, 2021
[PDF] [Project Page]