Speech Enhancement
(Click here for a list of our research themes.)
Speech enhancement is a method of separating and extracting speech with minimal distortion from the audio that contains noise and reverberation. We are working on improving and expanding the fundamental technology using acoustic signal processing and machine learning techniques as well as developing application-specific technologies, like robot auditions.
Deep neural networks are often used for sound source separation, as they have high expressive power. However, training these networks requires a large amount of supervisory data that consists of the observed mixture and source signals. Obtaining undistorted signals is challenging in real-world environments, where only the mixture signal is available. To address this issue, we are focusing on developing and enhancing techniques for unsupervised training of source separation systems. This approach involves using only the mixture signals captured by microphones, without relying on clean, distortion-free speech signals.
Unmix-Remix Consistent Learning
New framework for unsupervised source separation that enables highly accurate source separation and stable learning
We have developed a novel framework for unsupervised source separation that enables stable learning and generation of signals with low distortion by iteratively separating and remixing noisy signals to reconstruct the observed signal.
Relevant Publications:
Kohei Saijo, Tetsuji Ogawa, ``Self-Remixing: Unsupervised speech separation via separation and remixing,'' Proc. 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2023), June 2023.
Kohei Saijo, Tetsuji Ogawa, ``Unsupervised training of sequential neural beamformer using coarsely-separated and non-separated signals,'' Proc. The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH2022), Sept. 2022. [DOI] [Scopus]
Kohei Saijo, Tetsuji Ogawa, ``Remix-cycle-consistent learning on adversarially learned separator for accurate and stable unsupervised speech separation,'' Proc. 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2022), pp.4373-4377, May 2022. [DOI] [Scopus]
Mentoring and Reverse Mentoring Learning
New knowledge propagation framework for unsupervised learning that does not require a pair of noisy observed signals and a desired undistorted signal
The senior system is a model that can be built using unsupervised learning, but it requires high-quality initial values. On the other hand, the junior system is a deep learning model that needs good-quality supervised data to function well. Our hypothesis is that the senior system can transmit a pseudo-correct signal, estimated from a probabilistic model of the sound source, to the junior system. Conversely, the junior system can transfer data-driven knowledge, such as correlations between frequencies and those between data, to the senior system.
Relevant Publications:
Yu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi, ``Efficient and stable adversarial learning using unpaired data for unsupervised multichannel speech separation,'' Proc. The 22th Annual Conference of the International Speech Communication Association (INTERSPEECH2021), pp.3051-3055, Aug. 2021. [DOI] [Scopus]
Yu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi, ``Mentoring-reverse mentoring for unsupervised multi-channel speech source separation,'' Proc. The 21th Annual Conference of the International Speech Communication Association (INTERSPEECH2020), pp.86-90, Oct. 2020. [DOI] [Scopus]
Compensation of Signal Processing Distortion
Technology to compensate for the unpleasant distortion that is inherent in signal processing when attempting to achieve high accuracy in removing noise from noisy speech
In most cases, the process of removing noise from noisy speech with high accuracy leads to an unpleasant distortion that is inherent in signal processing. Conversely, attempts to reduce this distortion can result in residual noise. Our research aims to find a method that resolves this trade-off by suppressing distortion while removing noise with high accuracy.
Relevant Publications:
Riku Ogino, Kohei Saijo, Tetsuji Ogawa, ``Design of discriminators in GAN-based unsupervised learning of neural post-processors for suppressing localized spectral distortion,'' Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2022 (APSIPA2022), pp.969-975, Nov. 2022. [DOI]
Naohiro Tawara, Hikari Tanabe, Tetsunori Kobayashi, Masaru Fujieda, Kazuhiro Katagiri, Takashi Yazu, Tetsuji Ogawa, ``Postfiltering using an adversarial denoising autoencoder with noise-aware training,'' Proc. 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2019), pp.3282-3286, May 2019. [DOI] [Scopus]
Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi, ``Associative memory model-based linear filtering and its application to tandem connectionist blind source separation,’’ IEEE Trans. Acoust. Speech Lang. Process., vol.25, no.3, pp.637-650, March 2017. [DOI] [Scopus]
End-to-End Speech Enhancement
Technology for accurately estimating clean, undistorted signals from a speech in noisy environments using deep neural networks with high expressive power
As preparing a large number of pairs of noisy signals and undistorted source signals can be a significant barrier in practical applications, we are focusing on developing technologies that work robustly in real-world problems. Specifically, we are exploring compact network design, incorporating knowledge into network design, and joint learning with filters by signal processing.
Relevant Publications:
Yu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi, ``Deep speech extraction with time-varying spatial filtering guided by desired direction attractor,'' Proc. 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2020), pp.671-675, May 2020. [DOI] [Scopus]
Takuya Hasumi, Tetsunori Kobayashi, Tetsuji Ogawa, ``Investigation on network architecture for single-channel end-to-end denoising,'' Proc. The 2020 European Signal Processing Conference (EUSIPCO2020), pp.441-445, Jan. 2020. [DOI] [Scopus]
Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa, ``Multi-channel speech enhancement using time-domain convolutional denoising autoencoder,'' Proc. The 20th Annual Conference of the International Speech Communication Association (INTERSPEECH2019), pp.86-90, Sept. 2019. [DOI] [Scopus]
Sound Source Separation Technology for Real-world Applications
Compact, fast, and robust sound source separation and noise suppression technology for real-world applications such as mobile device voice input interfaces and robot auditions.
To realize voice input interfaces for mobile terminals and robot hearing, sound source separation and noise suppression technology must meet several requirements. These include i) the need for voice input devices to be small and computationally efficient, ii) the ability to operate robustly even when the sound source (user) moves slightly, and iii) the ability to simultaneously suppress various types of directional and diffuse noises. We have studied noise source separation and noise suppression techniques that satisfy these three requirements simultaneously. We have developed microphone arrays that can be mounted on mobile terminals and robots, and we have confirmed their effectiveness. You can click here to see a demo movie of our robot hearing technology.
Relevant Publications:
Tetsuji Ogawa, Shintaro Takada, Kenzo Akagiri, and Tetsunori Kobayashi, ``Speech enhancement using a square microphone array in the presence of directional and diffuse noise,’’ IEICE Trans. Fundamentals, vol.E93-EA, no.5, pp.926-935, May 2010. [IEICE] [Scopus]
Kosuke Hosoya, Tetsuji Ogawa, Tetsunori Kobayashi, ``Robot auditory system using head-mounted square microphone array,’’ Proc. 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS2009), pp.2736-2741, Oct. 2009. [DOI] [Scopus][Demo]
Naoya Mochiki, Tetsuji Ogawa, Tetsunori Kobayashi, ``Ears of the robot: direction of arrival estimation based on pattern recognition using robot-mounted microphones,’’ IEICE Trans. Inf., & Syst., vol.E91-D, no.5, pp.1522-1530, May 2008. [IEICE] [Scopus]