Causal and Relaxed-Distortionless Response Beamforming for Online Target Source Extraction

Abstract

In this paper, we propose a low-latency beamforming method for target source extraction. Beamforming has been performed in the time-frequency domain and achieved promising results in offline applications. Meanwhile, it causes a long algorithmic delay due to the frame analysis. Such a delay is unacceptable in various low-latency real-time applications, including hearing aids. To reduce this delay, we propose a causal variant of the minimum power distortionless response (MPDR) beamformer. The proposed method constraints the non-causal components of the spatial filter to be zero in the optimization of the MPDR beamformer. The algorithmic delay is reduced to zero by applying the causal spatial filter in the time domain. We further propose to relax the distortionless constraint regarding the gain, which allows us to improve the extraction performance without a phase delay. The Douglas-Rachford splitting method and its online extension are adopted to solve the optimization problems of the proposed methods. In our experiment, the relaxed method outperformed various low-latency beamforming methods in terms of extraction performance.

Overview of the proposed method

The causal filter is optimized by the Douglas-Rachford splitting (DRS) method or its adaptive version based on the STFT of the observed signal (blue block). The target signal is extracted by using the obtained causal filter in the time domain (purple block), which is performed per sample of the observed signal.

Audio Demos

We compared the Prop-exact and Prop-relaxed [1] with the existing low-latency beamforming techniques [2-4]. The clean utterances were from the Voice Conversion Challenge (VCC) 2018 dataset [5]. The CHiME 3 noise [6] was used for synthesizing the diffuse noise. Please check our paper for more details.

Interference speaker suppression

Reference

[1] Y. Masuyama, K. Yamaoka, Y. Kinoshita, T. Nakashima, and N. Ono, "Causal and Relaxed-Distortionless Response Beamforming for Online Target Source Extraction," IEEE/ACM TASLP. (Accepted)

[2] S. U. N. Wood and J. Rouat, "Unsupervised low latency speech enhancement with RT-GCC-NMF," IEEE JSTSP, vol. 13, no. 2, pp. 332–346, May 2019. 

[3] T. Nakatani and K. Kinoshita, "A unified convolutional beamformer for simultaneous denoising and dereverberation," IEEE SPL, vol. 26, no. 6, pp. 903–907, Jun. 2019.

[4] M. Sunohara, C. Haruta, and N. Ono, "Low-latency real-time blind source separation for hearing aids based on time-domain implementation of online independent vector analysis with truncation of non-causal components," Proc. IEEE ICASSP, Mar. 2017, pp. 216–220. 

[5] J. Lorenzo-Trueba, J. Yamagishi, T. Toda, D. Saito, F. Villavicencio, T. Kinnunen, and Z. Ling, "The voice conversion challenge 2018: Promoting development of parallel and nonparallel methods," Proc. Odyssey, Jun. 2018, pp. 195–202. 

[6] J. Barker, R. Marxer, E. Vincent, and S. Watanabe, "The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and base- lines," Proc. IEEE ASRU, Dec. 2015, pp. 504–511.