PUBLICATION

Arxiv

Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization, S.-H. Lee, H.-Y. Choi, and S.-W. Lee, 2024. (Under Review) [Code]

2025

[C23] StreamFlow: Streaming Audio Generation from Discrete Tokens via Streaming Flow Matching, H.-Y. Choi, S.-H. Lee, NeurIPS, 2025.

[C22] CoreaSpeech: Korean Speech Corpus via JAMO-based Coreset Selection for Efficient and Robust Korean Speech Generation, K.-J. Kwon, J.-H. So, and S.-H. Lee, NeurIPS Datasets and Benchmarks, 2025.

[J7] HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation by Hierarchical Variational Inference for Zero-shot Speech Synthesis, S.-H. Lee, H.-Y. Choi, S.-B. Kim, and S.-W. Lee, IEEE Trans. on Neural Networks and Learning Systems, 2025. [Demo] [Code] [Gradio]

[C21] Parameter-Efficient Fine-Tuning for Low-Resource Text-to-Speech via Cross-Lingual Continual Learning, K.-J. Kwon, J.-H. So, and S.-H. Lee, Interspeech, 2025. [Demo]

[C20] PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation, S.-H. Lee, H.-Y. Choi, and S.-W. Lee, ICLR, 2025. [Code]

[J6] DurFlex-EVC: Duration-Flexible Emotional Voice Conversion with Parallel Generation, H.-S. Oh, S.-H. Lee, D.-H. Cho, and S.-W. Lee, IEEE Trans. on Affective Computing, 2025. [Demo] [Code]

[J5] Personalized and Controllable Voice Style Transfer with Speech Diffusion Transformer, H.-Y. Choi, S.-H. Lee, and S.-W. Lee, IEEE Trans. on Audio, Speech and Language Processing, 2025

[J4] HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models, J.-S. Hwang, S.-H. Lee, and S.-W. Lee, Neural Networks, 2025 [Demo]

2024

[J3] DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training, H.-S. Oh, S.-H. Lee, and S.-W. Lee, IEEE Trans. on Audio, Speech and Language Processing, 2024 [Demo] [Code]

[C19] Cross-lingual Text-to-Speech via Hierarchical Style Transfer, S.-H. Lee, H.-Y. Choi, and S.-W. Lee, ICASSPW, 2024.

[J2] Audio Super-resolution with Robust Speech Representation Learning of Masked Autoencoder, S.-B. Kim, S.-H. Lee, H.-Y. Choi, S.-W. Lee, IEEE Trans. on Audio, Speech and Language Processing, 2024.

[C18] TranSentence: Speech-to-Speech Translation via Language-agnostic Sentence-level Speech Encoding without Language-parallel Data, S.-B. Kim, S.-H. Lee, and S.-W. Lee, ICASSP, 2024.

[C17] MIDI-Voice: Expressive Zero-shot Singing Voice Synthesis via MIDI-driven Priors, D.-M. Byun, S.-H. Lee, J.-S. Hwang, and S.-W. Lee, ICASSP, 2024.

[C16] DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion, H.-Y. Choi*, S.-H. Lee*, and S.-W. Lee, AAAI, 2024. [Demo] [Code] [Poster]

2023

[C15] HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer, S.-H. Lee*, H.-Y. Choi*, H.-S. Oh, and S.-W. Lee, Interspeech, 2023. (Oral) [Arxiv] [Demo]

[C14] Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation, H.-Y. Choi, S.-H. Lee, and S.-W. Lee, Interspeech, 2023. (Oral) [Demo] [Code]

[C13] PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling, J.-S. Hwang, S.-H. Lee, and S.-W. Lee, ACPR, 2023. [Demo]

2022

[C12] HierSpeech: Bridging the Gap between Text and Speech by Hierarchical Variational Inference using Self-supervised Representations for Speech Synthesis, S.-H. Lee, S.-B. Kim, J.-H. Lee, E. Song, M.-J. Hwang, and S.-W. Lee, NeurIPS, 2022. [OpenReview] [Demo] [Poster]

[J1] Duration Controllable Voice Conversion via Phoneme-based Information Bottleneck, S.-H. Lee, H.-R. Noh, W. Nam, and S.-W. Lee, IEEE Trans. on Audio, Speech and Language Processing, 2022. (2022-JCR-IF: 5.4, JIF PERCENTILE TOP 8.10%)

[C11] StyleVC: Non-Parallel Voice Conversion with Adversarial Style Generalization, I. Hwang, S.-H. Lee, and S.-W. Lee, ICPR, 2022. [Demo] [Code]

[C10] Fre-GAN 2: Fast and Efficient Frequency-consistent Audio Synthesis, S.-H. Lee, J.-H. Kim, G.-E. Lee, and S.-W. Lee, ICASSP, 2022. [Demo] [Code]

[C9] PVAE-TTS: Progressively Style Adaptive Text-to-Speech via Progressive Variaional Autoencoder, J.-H. Lee, S.-H. Lee, J.-H. Kim, and S.-W. Lee, ICASSP, 2022. [Demo]

[C8] EmoQ-TTS: Emotion Intensity Quantization for Fine-Grained Controllable Emotional Text-to-Speech, C.-B. Im, S.-H. Lee, and S.-W. Lee, ICASSP, 2022. [Demo]

~2021

[C7] VoiceMixer: Adversarial Voice Style Mixup, S.-H. Lee, J.-H. Kim, H. Chung, and S.-W. Lee, NeurIPS, 2021. [Demo]

[C6] Multi-SpectroGAN: High-Diversity and High-Fidelity Spectrogram Generation with Adversarial Style Combination for Speech Synthesis, S.-H. Lee, H.-W. Yoon, H.-R. Noh, J.-H. Kim, and S.-W. Lee, AAAI, 2021. [Demo]

[C5] GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints, J.-H. Kim, S.-H. Lee, J.-H. Lee, H.-G. Jung, and S.-W. Lee, SMC, 2021.

[C4] Fre-GAN: Adversarial Frequency-consistent Audio Synthesis, J.-H. Kim, S.-H. Lee, J.-H. Lee, and S.-W. Lee, Interspeech, 2021.

[C3] Reinforce-Aligner: Reinforcement Alignment Search for Robust End-to-End Text-to-Speech, H. Chung, S.-H. Lee, and S.-W. Lee, Interspeech, 2021.

[C2] Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder, H.-W. Yoon, S.-H. Lee, H.-R. Noh, and S.-W. Lee, Interspeech, 2020.

[C1] Learning Machines Can Curl - Adaptive deep reinforcement learning enables the robot Curly to win against human players in an icy world, D.-O. Won, S.-H. Lee, K.-R. Muller, and S.-W. Lee, NeurIPS 2019 Demonstration Track, 2019. [Video] [Poster]

Page updated

Google Sites

Report abuse