Speech Enhancement

(Click here for a list of our research themes.)

Speech enhancement is a method of separating and extracting speech with minimal distortion from the audio that contains noise and reverberation. We are working on improving and expanding the fundamental technology using acoustic signal processing and machine learning techniques as well as developing application-specific technologies, like robot auditions.

Deep neural networks are often used for sound source separation, as they have high expressive power. However, training these networks requires a large amount of supervisory data that consists of the observed mixture and source signals. Obtaining undistorted signals is challenging in real-world environments, where only the mixture signal is available. To address this issue, we are focusing on developing and enhancing techniques for unsupervised training of source separation systems. This approach involves using only the mixture signals captured by microphones, without relying on clean, distortion-free speech signals.

Unmix-Remix Consistent Learning

New framework for unsupervised source separation that enables highly accurate source separation and stable learning

We have developed a novel framework for unsupervised source separation that enables stable learning and generation of signals with low distortion by iteratively separating and remixing noisy signals to reconstruct the observed signal.  

Relevant Publications:

Mentoring and Reverse Mentoring Learning

New knowledge propagation framework for unsupervised learning that does not require a pair of noisy observed signals and a desired undistorted signal

The senior system is a model that can be built using unsupervised learning, but it requires high-quality initial values. On the other hand, the junior system is a deep learning model that needs good-quality supervised data to function well. Our hypothesis is that the senior system can transmit a pseudo-correct signal, estimated from a probabilistic model of the sound source, to the junior system. Conversely, the junior system can transfer data-driven knowledge, such as correlations between frequencies and those between data, to the senior system.

Relevant Publications:

Compensation of Signal Processing Distortion

Technology to compensate for the unpleasant distortion that is inherent in signal processing when attempting to achieve high accuracy in removing noise from noisy speech

In most cases, the process of removing noise from noisy speech with high accuracy leads to an unpleasant distortion that is inherent in signal processing. Conversely, attempts to reduce this distortion can result in residual noise. Our research aims to find a method that resolves this trade-off by suppressing distortion while removing noise with high accuracy.

Relevant Publications:

End-to-End Speech Enhancement

Technology for accurately estimating clean, undistorted signals from a speech in noisy environments using deep neural networks with high expressive power

As preparing a large number of pairs of noisy signals and undistorted source signals can be a significant barrier in practical applications, we are focusing on developing technologies that work robustly in real-world problems. Specifically, we are exploring compact network design, incorporating knowledge into network design, and joint learning with filters by signal processing.

Relevant Publications:

Sound Source Separation Technology for Real-world Applications

Compact, fast, and robust sound source separation and noise suppression technology for real-world applications such as mobile device voice input interfaces and robot auditions.

To realize voice input interfaces for mobile terminals and robot hearing, sound source separation and noise suppression technology must meet several requirements. These include i) the need for voice input devices to be small and computationally efficient, ii) the ability to operate robustly even when the sound source (user) moves slightly, and iii) the ability to simultaneously suppress various types of directional and diffuse noises. We have studied noise source separation and noise suppression techniques that satisfy these three requirements simultaneously. We have developed microphone arrays that can be mounted on mobile terminals and robots, and we have confirmed their effectiveness. You can click here to see a demo movie of our robot hearing technology.

Relevant Publications: