Kim Sung-Bin
Hi! I am a PhD student in the Algorithmic Machine Intelligence (AMI) Lab, Dept. of Electrical Engineering, POSTECH, South Korea, advised by Prof. Tae-Hyun Oh. I received the Master's degree from the AMI Lab at POSTECH, and the Bachelor's degree from Dept. of Electrical Engineering, Handong University, South Korea.
I am interested in multi-modal learning, and cross-modal generation, but not limited to.
contact: sungbin [at] postech [dot] ac [dot] kr | sbkim052 [at] gmail [dot] com
Google scholar | LinkedIn | CV
Publications (conferences)
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models, ICLR 2025
Kim Sung-Bin*, Oh Hyun-Bin*, JungMok Lee, Arda Senocak, Joon Son Chung, Tae-Hyun Oh
[project page] [arxiv]
We introduce a comprehensive audio-visual hallucination benchmark specifically designed to evaluate the perception and comprehension capabilities of audio-visual LLMs
SoundBrush: Sound as a Brush for Visual Scene Editing, AAAI 2025
Kim Sung-Bin, Kim Jun-Seong, Junseok Ko, Yewon Kim, Tae-Hyun Oh
[project page] [arxiv]
We manipulate visual scenes to reflect the mood of the input audio or to insert sounding objects while preserving the original structure
MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset, Interspeech 2024
Kim Sung-Bin*, Lee Chae-Yeon*, Gihun Son*, Oh Hyun-Bin, Janghoon Ju, Suekyeong Nam, Tae-Hyun Oh
[project page] [dataset] [arxiv]
We generate a 3D talking head with enhanced performance on multilingual speech
Enhancing Speech-driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert, Interspeech 2024
Han Eungi*, Oh Hyun-Bin*, Kim Sung-Bin, Corentin Nivelet Etcheberry, Suekyeong Nam, Janghoon Ju, Tae-Hyun Oh
[project page]
We enhance the lip accuracy of a 3D talking head using lip reading expert
😀SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models, NAACL 2024 Findings
Lee Hyun*, Kim Sung-Bin*, Seungju Han, Youngjae Yu, Tae-Hyun Oh
[dataset] [arxiv]
We introduce video laugh reasoning, a new task for machines to understand the rationale behind laughter in video
Presented in [Workshop on AV4D, in conjunction with ICCV, 2023]
LaughTalk: Expressive 3D Talking Head Generation with Laughter, WACV 2024
Kim Sung-Bin, Lee Hyun, Da hye Hong, Suekyeong Nam, Janghoon Ju, Tae-Hyun Oh
[project page] [paper] [arxiv]
We generate a 3D talking head that simultaneously expresses speech articulation and laughter
Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment, CVPR 2023
Kim Sung-Bin, Arda Senocak, Hyunwoo Ha, Andrew Owens, Tae-Hyun Oh
[project page] [paper] [arxiv]
We generate images from diverse in-the-wild environmental sound
Covered by Korean news media (Yonhap News, etc.), and featured by YTN Science Channel
Invited talk in [Workshop on Sound and Sight, in conjunction with CVPR, 2023], and [Korean Artificial Intelligence Association, 2023]
Presented in [Workshop on AI4CC, in conjunction with CVPR, 2023], and [Workshop on AV4D, in conjunction with ICCV, 2023]
Prefix Tuning for Automated Audio Captioning, ICASSP 2023 [ORAL]
Minkyu Kim*, Kim Sung-Bin*, Tae-Hyun Oh
[project page] [paper] [arxiv]
We generate text descriptions from environmental sound
Covered by Korean news media (Yonhap News, etc.)
Real-time Face Registration and Classification System using Fuzzy ARTMAP, ICROS 2020
Kim Sung-Bin, Wong Hyong Lee
[paper]
We register and classify faces in real-time
Publications (journals)
Revisiting Learning-based Video Motion Magnification for Real-time Processing, under review
Hyunwoo Ha*, Oh Hyun-Bin*, Kim Jun-Seong, Kwon Byung-Ki, Kim Sung-Bin, Linh-Tam Tran, Ji-Yun Kim, Sung-Ho Bae, Tae-Hyun Oh
[arxiv]
We magnify small and subtle motions into a human perceptible motion
A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized Optimization, TMLR 2024
Kim Youwang, Lee Hyun*, Kim Sung-Bin*, Suekyeong Nam, Janghoon Ju, Tae-Hyun Oh
[project page] [arxiv]
We propose reliable 3D face mesh annotations on large-scale facial video datasets
Presented in [Workshop on AV4D, in conjunction with ICCV, 2023]
Invited to ICLR 2025 main conference as a poster presentation (6.61%)
The Devil in the Details: Simple and Effective Optical Flow Synthetic Data Generation, TVCJ 2024 [IF: 3.5]
Kwon Byung-Ki, Kim Sung-Bin, Tae-Hyun Oh
[arxiv]
We generate a simple yet effective synthetic optical flow dataset
Lightweight Speaker Recognition in Poincaré Spaces, SPL 2021 [IF: 3.2]
Jieun Lee*, Kim Sung-Bin*, Seokhyeong Kang, Tae-Hyun Oh
[paper]
We design Poincaré speaker embedding space for speaker recognition and verification
Awards & Honors
Full-ride scholarship from SBS Cultural Foundation (news), max $75,000 for 4-years
Summa Cum Laude, Handong University
Best Student Paper Award, ICROS, 2020
Full-ride scholarship during B.S. degree
Academic Services
Journal Reviewer: TASL 2023, TASL 2024
Conference Reviewer: CVPR2024, NeurIPS2024, ICLR2025, CVPR2025
Work Experiences
Military Service (discharged as a sergeant), Vanguard Unit, Republic of Korea Army, 09/2016 - 06/2018