Kim Sung-Bin

Hi! I am a PhD student in the Algorithmic Machine Intelligence (AMI) Lab, Dept. of Electrical Engineering, POSTECH, South Korea, advised by Prof. Tae-Hyun Oh. I received the Master's degree from the AMI Lab at POSTECH, and the Bachelor's degree from Dept. of Electrical Engineering, Handong University, South Korea.

I am interested in multi-modal learning, and cross-modal generation, but not limited to.

contact: sungbin [at] postech [dot] ac [dot] kr | sbkim052 [at] gmail [dot] com

Google scholar | LinkedIn | CV

Publications (conferences)

FacEDiT: Unified Talking Face Editing and Generation via Facial Motion Infilling, Arxiv 2025
Kim Sung-Bin, Joohyun Chang, David Harwath, Tae-Hyun Oh
[project page] [arxiv]
We propose a unified framework for talking face editing and talking face generation

VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models, ICCV 2025
Kim Sung-Bin, Jeongsoo Choi, Puyuan Peng, Joon Son Chung, Tae-Hyun Oh, David Harwath
[project page] [arxiv]
We introduce a model that synthesizes speech that is temporally and expressively aligned with the video

Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics, CVPR 2025

Lee Chae-Yeon, Oh Hyun-Bin, Han EunGi, Kim Sung-Bin, Suekyeong Nam, Tae-Hyun Oh
[project page] [arxiv]
We define three criteria to assess perceptual alignment between speech and lip movements of 3D talking heads

AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models, ICLR 2025
Kim Sung-Bin*, Oh Hyun-Bin*, JungMok Lee, Arda Senocak, Joon Son Chung, Tae-Hyun Oh
[project page] [arxiv]
We introduce a comprehensive audio-visual hallucination benchmark specifically designed to evaluate the perception and comprehension capabilities of audio-visual LLMs

SoundBrush: Sound as a Brush for Visual Scene Editing, AAAI 2025
Kim Sung-Bin, Kim Jun-Seong, Junseok Ko, Yewon Kim, Tae-Hyun Oh
[project page] [arxiv]
We manipulate visual scenes to reflect the mood of the input audio or to insert sounding objects while preserving the original structure

MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset, Interspeech 2024
Kim Sung-Bin*, Lee Chae-Yeon*, Gihun Son*, Oh Hyun-Bin, Janghoon Ju, Suekyeong Nam, Tae-Hyun Oh
[project page] [dataset] [arxiv]
We generate a 3D talking head with enhanced performance on multilingual speech

Enhancing Speech-driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert, Interspeech 2024
Han Eungi*, Oh Hyun-Bin*, Kim Sung-Bin, Corentin Nivelet Etcheberry, Suekyeong Nam, Janghoon Ju, Tae-Hyun Oh
[project page]
We enhance the lip accuracy of a 3D talking head using lip reading expert

😀SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models, NAACL 2024 Findings
Lee Hyun*, Kim Sung-Bin*, Seungju Han, Youngjae Yu, Tae-Hyun Oh
[dataset] [arxiv]
We introduce video laugh reasoning, a new task for machines to understand the rationale behind laughter in video

Presented in [Workshop on AV4D, in conjunction with ICCV, 2023]

LaughTalk: Expressive 3D Talking Head Generation with Laughter, WACV 2024
Kim Sung-Bin, Lee Hyun, Da hye Hong, Suekyeong Nam, Janghoon Ju, Tae-Hyun Oh
[project page] [paper] [arxiv]
We generate a 3D talking head that simultaneously expresses speech articulation and laughter

Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment, CVPR 2023
Kim Sung-Bin, Arda Senocak, Hyunwoo Ha, Andrew Owens, Tae-Hyun Oh
[project page] [paper] [arxiv]
We generate images from diverse in-the-wild environmental sound

Covered by Korean news media (Yonhap News, etc.), and featured by YTN Science Channel
Invited talk in [Workshop on Sound and Sight, in conjunction with CVPR, 2023], and [Korean Artificial Intelligence Association, 2023]
Presented in [Workshop on AI4CC, in conjunction with CVPR, 2023], and [Workshop on AV4D, in conjunction with ICCV, 2023]

Prefix Tuning for Automated Audio Captioning, ICASSP 2023 [ORAL]
Minkyu Kim*, Kim Sung-Bin*, Tae-Hyun Oh
[project page] [paper] [arxiv]
We generate text descriptions from environmental sound

Covered by Korean news media (Yonhap News, etc.)

Real-time Face Registration and Classification System using Fuzzy ARTMAP, ICROS 2020
Kim Sung-Bin, Wong Hyong Lee
[paper]
We register and classify faces in real-time

Publications (journals)

Revisiting Learning-based Video Motion Magnification for Real-time Processing, under review
Hyunwoo Ha*, Oh Hyun-Bin*, Kim Jun-Seong, Kwon Byung-Ki, Kim Sung-Bin, Linh-Tam Tran, Ji-Yun Kim, Sung-Ho Bae, Tae-Hyun Oh
[arxiv]
We magnify small and subtle motions into a human perceptible motion

A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized Optimization, TMLR 2024
Kim Youwang, Lee Hyun*, Kim Sung-Bin*, Suekyeong Nam, Janghoon Ju, Tae-Hyun Oh
[project page] [arxiv]
We propose reliable 3D face mesh annotations on large-scale facial video datasets

Presented in [Workshop on AV4D, in conjunction with ICCV, 2023]
Invited to ICLR 2025 main conference as a poster presentation (6.61%)

The Devil in the Details: Simple and Effective Optical Flow Synthetic Data Generation, TVCJ 2024 [IF: 3.5]
Kwon Byung-Ki, Kim Sung-Bin, Tae-Hyun Oh
[arxiv]
We generate a simple yet effective synthetic optical flow dataset

Lightweight Speaker Recognition in Poincaré Spaces, SPL 2021 [IF: 3.2]
Jieun Lee*, Kim Sung-Bin*, Seokhyeong Kang, Tae-Hyun Oh
[paper]
We design Poincaré speaker embedding space for speaker recognition and verification

Awards & Honors

Full-ride scholarship from SBS Cultural Foundation (news), max $75,000 for 4-years
Summa Cum Laude, Handong University
Best Student Paper Award, ICROS, 2020
Full-ride scholarship during B.S. degree

Academic Services

Journal Reviewer: TASL 2023, TASL 2024
Conference Reviewer: CVPR2024, NeurIPS2024, ICLR2025, CVPR2025, ICML 2025

Work Experiences

Military Service (discharged as a sergeant), Vanguard Unit, Republic of Korea Army, 09/2016 - 06/2018

Page updated

Google Sites

Report abuse