Deep Learning for Monaural Source Separation

Deep Learning For Monaural Source Separation

Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis

Abstract

Monaural source separation is important for many real world applications. It is challenging in that, given only single channel information is available, there is an infinite number of solutions without proper constraints.

In this paper, we explore joint optimization of masking functions and deep recurrent neural networks for monaural source separation tasks, including the monaural speech separation task, monaural singing voice separation task, and speech denoising task. The joint optimization of the deep recurrent neural networks with an extra masking layer enforces a reconstruction constraint. Moreover, we explore a discriminative training criterion for the neural networks to further enhance the separation performance. We evaluate our proposed system on TSP, MIR-1K, and TIMIT dataset for speech separation, singing voice separation, and speech denoising tasks, respectively.

Our approaches achieve 2.30~4.98 dB SDR gain compared to NMF models in the speech separation task, 2.30~2.48 dB GNSDR gain and 4.32~5.42 dB GSIR gain compared to previous models in the singing voice separation task, and outperform NMF and DNN baseline in the speech denoising task.

ICASSP 2014 Video Demo

Download

Reference

1. Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis

Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation

in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 12, pp. 2136–2147, Dec. 2015 (PDF, Bibtex)

[2020 IEEE SPS Best Paper Award]

1. Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis

Singing-Voice Separation from Monaural Recordings using Deep Recurrent Neural Networks

Proc. of the International Society for Music Information Retrieval (ISMIR), 2014 (PDF, Bibtex)

1. Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis

Deep Learning for Monaural Speech Separation

Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014. (PDF, Slides, Bibtex)

[Starkey Signal Processing Research Student Grant]

Feedback

Email me if you have any questions.

Page updated

Google Sites

Report abuse