フーリエ級数ベースの音響伝達関数モデルのオンライン適応による音源定位・分離の改善(2022年)

Online Adaptation of Fourier Series Based Acoustic Transfer Function Model to Improve Sound Source Localization and Separation

 This paper proposes an online adaptation method for Fourier series based acoustic transfer function (TF) models for robot audition systems based on microphone array signal processing. 

The TF represents the signal propagation characteristics from a sound source to a microphone, which is an essential component for real-world auditory scene analysis, including sound source localization and separation.

The real-world applications of TF-based array signal processing requires two characteristics: 1) adaptability to changes in the acoustic environment (changes in the signal propagation characteristics between the sound source and the microphone), and 2) a lightweight TF set for use in embedded systems such as robots with limited memory and computational resources. 

This paper proposes an online adaptation method for lightweight TF models using the Fourier series expansion. This method has both above two characteristics. 

The conventional TF model requires multiple TFs for each direction, while the Fourier-based TF model interpolates the TF set based on the Fourier series expansion.

Specifically, instead of having many TFs represented as H for each direction, fewer Fourier coefficients C are stored.

The TF H is calculated using the Fourier coefficients C and the matrix of phaser functions S to perform localization and separation.

(1) First, sound source localization is performed based on the current TF. The localization accuracy is extremely important for the TF set update.

So, we used a linear smoothing and outlier removal to minimize the localization error as much as possible.


(2) The normalized TF H is estimated using the detected sound source using this equation.


(3) Then, the current TF set is updated so that the difference from the estimated TF in the localized direction becomes smaller.

However, since the Fourier-based TF model represents TF set by Fourier coefficients instead of keeping TFs for each direction, the difference for the corresponding Fourier coefficients C are calculated.


(4) Finally, the current Fourier coefficients C are updated with an adaptation rate.


First, we compared the amplitude spectra of the TF set after the adaptation in a real environment.

Without adaptation, the geometrically computed TF set was not updated, while the proposed method correctly adapted to the current environment as well as the conventional TF model-based adaptation method.

This table shows the evaluation results on localization task.

Regardless of the TF model, the adaptation significantly reduced the localization error.

The proposed Fourier-based adaptation method shows smaller localization error than the conventional adaptation method.

Additionally, this table shows the separation results.

As in the localization task, the adaptation significantly improved the SDR.

Furthermore, the proposed Fourier-based adaptation method showed significantly better SDR than the conventional adaptation method.

This may be because the proposed method can use arbitrary angular resolution, while the performance of the conventional TF model is limited to a predetermined angular resolution.

Furthermore, by reducing the Fourier order, the TF size was reduced with a slight performance degradation.

 国際学会 / Peer reviewed conference paper