Jibin Song - Papers

Jibin Song

Papers

❤️ : First / co-first author 🧡 : My favorite co-working papers!

IMG_i2v_input_AUDIO_div7.mp4

❤️ Syncphony: Synchronized Audio-to-Video Generation with Diffusion Transformers

Jibin Song, Mingi Kwon, Jaeseok Jeong, Youngjung Uh

[ICLR 2026] Project Page / Paper

Syncphony is an audio-to-video generation framework that produces videos with precise audio-motion synchronization. It improves temporal alignment through motion-aware training and audio-emphasized inference, while maintaining strong visual quality across diverse audio inputs.

❤️ FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation

Jibin Song, Mingi Kwon, Jaeseok Jeong, Youngjung Uh

[arXived 2026] Project Page / Paper

FlowBlending is a stage-aware diffusion sampling method that accelerates video generation by using large models only when capacity matters (early and late timesteps), and small models elsewhere. It achieves significant speedups while preserving visual quality and temporal consistency.

❤️ TextAway: Mask-Free Video Text Removal with End-to-End Text-Aware Generation

Jibin Song, Mingi Kwon, Sooyeon Go, Youngjung Uh

[under review 2026] Project Page / Paper (coming soon)

TextAway removes subtitles, captions, and other overlaid text from videos without masks. It is an end-to-end text-aware generation framework that restores clean videos directly from corrupted inputs, without OCR, text detection, or external mask generation at inference time.

Page updated

Google Sites

Report abuse