Research

Joint Acoustic Dereverberation and Beamforming With Blind Estimation of the Shape Parameter of the Desired Source Prior 

Dereverberation and acoustic beamforming are jointly used to capture the speech of a desired speaker in the presence of interfering speakers in a reverberant room using an array of microphones. Traditionally, to perform these two tasks, the desired speech is modelled in the time-frequency domain using a complex Gaussian (CG) prior with time-varying variances. The shape parameter of the prior distribution is fixed at the same value for all time-frequency bins. In this work, we proposed to model the inverse of the variance (i.e. the precision parameter) of the CG prior distribution which controls the shape of the distribution as a Gamma distributed random variable. The hyperparameters of the Gamma distribution were then estimated based on the data captured by the microphones. This data-dependent blind estimation of the shape of the prior distribution helped the proposed algorithm to accurately model the desired speech and adapt to different speakers and acoustic scenarios better than algorithms with a fixed shape parameter. We used maximum likelihood techniques to estimate the multi-channel linear prediction (MCLP) dereverberation coefficients and the beamforming weights using the proposed signal model. The stochastically latent precision parameters were obtained by estimating the hyperparameters using the expectation maximization (EM) method. For the online version of the algorithm, a recursive EM method was also proposed for real-time processing. Extensive simulation results showed improved dereverberation and interference cancellation performance of the proposed method highlighting the importance of not choosing the shape parameter of the prior distribution manually. 

Experimental setup for joint acoustic dereverberation and beamforming in Audio Signal Processing Lab at IIT Gandhinagar.

Near-Field Acoustic Source Localization and Beamforming in the Spherical Sector Harmonics Domain

Three-dimensional arrays have the ability to localize sources anywhere in the spatial domain without any ambiguity. Among these arrays, the spherical microphone array (SMA) has gained widespread usage in acoustic source localization and beamforming. However, SMAs are bulky and in many applications with space and power constraints, it is undesirable to use an SMA. To deal with this issue, arrays with microphones placed only in a sector of a sphere have been developed along with various techniques for localizing far-field sources in the spherical sector harmonics (S2H) domain. This work addressed near-field acoustic localization and beamforming using a spherical sector microphone array. We presented the representation of spherical waves from a point source in the S2H domain using the orthonormal S2H basis functions. Then, using the representation, we developed an array model for when a spherical sector array is placed in a wavefield created by multiple near-field sources in the S2H domain. Using the developed array model, two algorithms were proposed for the joint estimation of the range, elevation and azimuth locations of near-field sources, namely NF-S2HMUSIC and NF-S2H-MVDR. Further, a near-field beamforming algorithm capable of radial and angular filtering in the S2H domain was also developed. Finally, we presented the Cramer-Rao Bound (CRB) in the S2H domain for near-field sources. The performances of the proposed algorithms were assessed using extensive localization and beamforming simulations.

Experimental setup for near field acoustic localization using open spherical sector microphone array in ASP Lab at IIT Gandhinagar.  

Sparse Distortionless Modal Beamforming for Spherical Microphone Arrays 

Phase-mode array processing which utilizes the spherical harmonics decomposition offers a useful framework for spherical microphone arrays. In the modal domain, one of the major applications of spherical arrays is acoustic beamforming. Usually, beamforming is performed by minimizing the power of the beamformer output with a distortionless constraint towards the look direction. Also, most beamformers model the spectral coefficients of target speech using a Gaussian distribution. In this work, a beamforming method that minimized the L0-norm of the beamformer output in the spherical harmonics domain with a distortionless constraint was proposed. The proposed method assumed a super-Gaussian prior for the target speech and sparsified the beamformer output which was shown to reduce the residual elements of the interfering speech signals from the beamformer output leading to superior spatial separation. The formulation of the proposed beamformer in the spherical harmonics signal model was developed along with an algorithm to solve it. Simulation results showed the effectiveness of the proposed beamformer in spatially filtering the desired speaker signal from the interfering signals using a number of objective measures. 

Performance evaluation of an active headrest system

In an active headrest system, virtual sensing tends to transfer the spatial zone of quiet from the residual error microphone to the ear canal. An attempt was made in this work to develop an auxiliary filter-based virtual sensing scheme integrated with the filtered-x least mean square/fourth algorithm for an active headrest. The performance of the proposed method was evaluated experimentally using periodic and band-limited white noise. Improved noise control performance was observed for both periodic and broadband noise. The effect of causality constraint on the performance of the algorithm was also tested. 

Experimental setup for performance evaluation of an active headrest system in ASP Lab at IIT Gandhinagar.  

Underdetermined DOA Estimation Using Arbitrary Planar Arrays via Coarray Manifold Separation

Conventional direction-of-arrival (DOA) estimation algorithms like MUSIC only allow localization of fewer number of sources than the number of physical sensors. In this work, underdetermined azimuth localization (localizing more sources than the number of sensors) using arbitrary planar arrays was proposed, using only second-order statistics of the received data. To achieve this, we utilized the difference coarray of the actual array and expressed the elements of the array covariance matrix as the signal received by the virtual sensors of the coarray. We explored the structure and geometry of the difference coarray of an N-element planar array and showed that the coarray can provide an increased degreeof-freedom (DOF) of O(N^2) which enabled underdetermined localization. Then, we extended the manifold separation (MS) technique to the coarray to express the coarray steering matrix in terms of a Vandermonde structured matrix by designing a signal independent coarray characteristic matrix. As the signal model of a coarray is a single snapshot model, the Vandermonde structure enabled us to perform a spatial smoothing type operation to restore the rank of the coarray covariance matrix. This allowed us to propose a novel subspace-based algorithm, which we called the coarrayMS-MUSIC, to perform underdetermined source localization using arbitrary planar arrays. We also introduced the polynomial rooting version of our algorithm called the coarrayMS-rootMUSIC. Finally, we conducted extensive numerical simulations to verify the effectiveness and usefulness of the proposed methods.

Demonstration of Time Difference of Arrival (TDOA) Localization Technique

TDOA is one of the techniques for direction-of-arrival (DOA) estimation, and it relies on measuring the time difference at which a signal arrives at different sensors within an array.

Here's a basic explanation of the TDOA setup:

Two microphone linear array

PreSonus Audio Box

Media1.mp4

Changing the direction of the sound source (to be played simultaneously with the adjacent video)

Media2.mp4

Arrow gives the estimate of the direction of the source 

Coarray Manifold Separation In The Spherical Harmonics Domain For Enhanced Source Localization

The order of a three-dimensional wavefield captured by a spherical array is limited by the number of sampling points i.e. the number of sensors in the array. This restricts the source localization performance of existing techniques for a spherical array. In this work, we introduced the concept of difference coarray to spherical arrays and proposed an algorithm which utilised the increased degreesof-freedom (DOF) provided by the virtual coarray sensors to perform enhanced source localization. We made use of coarray manifold separation in the spherical harmonics domain to generate a Vandermonde structured coarray manifold matrix which allowed us to propose a novel subspace-based algorithm, which we called the coarraySH-MUSIC. We also introduced a polynomial rooting version of our algorithm which does not rely on extensive grid searches. The proposed algorithms were evaluated using various simulated experiments on source localization. 

Coarray MUSIC-Group Delay: High-Resolution Source Localization Using Non-Uniform Arrays 

Non-uniform linear arrays (NULA) operate in the difference coarray domain to localize more sources than the number of physical sensors. Usually, the spectral magnitude of the subspace-based coarrayMUSIC algorithm is used for this task of underdetermined source localization. However, the phase spectrum of coarray MUSIC, which has not yet been utilized, exhibits sharp transitions at the direction of arrival (DOA) of the respective sources. As a result, the negative differential of this phase called the coarray group delay shows peaks at the DOAs. In this work, a new source localization technique called the coarray MUSIC-Group delay (coarray MGD) described as the product of the coarray-MUSIC magnitude and the coarray group delay function was introduced. It was proved that as long as the difference coarray of the NULA is a uniform linear array (ULA), the group delay function displays an additive property in the coarray spatial domain. Owing to this property, the proposed coarray MGD technique was able to localize and resolve more closely-spaced sources than the coarray MUSIC magnitude method. Numerical simulations are performed to validate the high-resolution property of the proposed method. 

Third-Order Tensor Decomposition Based Multichannel Linear Prediction for Robust Dereverberation

Reverberation is one of the major causes of speech degradation. The popular weighted prediction error (WPE) technique performs dereverberation by estimating the late room reflections using a multi-channel prediction filter. However, the length of the prediction filter in each short-time-Fourier-transform (STFT) band must be sufficiently long to model the late reverberation component accurately. This leads to inverting a large matrix in every frequency bin, making the WPE method computationally expensive. The WPE method is also vulnerable to additive noise. To tackle these issues, we developed a computationally efficient dereverberation technique in this work. We decomposed the long prediction filter into three smaller sub-filters using third-order tensor decomposition. One sub-filter acted as a spatial filter, while the other two acted as temporal prediction filters. We then developed an iterative algorithm to obtain optimal solutions for all three sub-filters. The spatial filter was optimized as a weighted distortionless beamformer to deal with noise, while the temporal filters were optimized as weighted Wiener filters. Since the lengths of the sub-filters are smaller, the respective covariance matrices were computationally easier to invert, leading to an efficient algorithm. Simulation results showed that the proposed algorithm was robust to noise and outperforms the current WPE based algorithms in terms of dereverberation.