Cocktail Party Problem

Members

Devanshu Gupta
Utkarsha Suman
Reshma
Fatwir
Jagadish
Anshuman

Abstract

Cocktail Party Problem Implementation involves blind separation of signal from a single mixed channel. We aim to separate the independent signals in a given audio file and store them in wav files for further processing. In this project we have designed an algorithm which takes an audio input, delays it, performs FastIca algorithm on the delayed signals the output of which is then clustered to find the impulse response of the FIR filters. The filters are then used to isolate and extract the independent signals that share the same audio channel simultaneously. This program was studied with many hyper parameters that makes up the program in order to improve the results.

KEYWORDS:MATLAB, FIR filters,audio signal, K-means cluster , signal delay,single channel, ICA(SCICA), Independent Component Analysis,FastIca algorithm, Blind Source Separation(BSS),FFT

Introduction

Audio signal processing is a branch of signal processing which deals with manipulation of audio signals. Now a days, analysis and processing of audio signals has become potential interest of engineers. The inputs are provided in audio to get a certain result. In this way Independent Component Analysis via the FastIca is used for blind source separation of signal from mixed channels of audio signal. For the implementation of BSS from a single channel mixes, the traditional FastIca algorithm cannot be used. Hence we need to develop a complementary algorithm known as the Single Channel Independent Component Analysis(SCICA).

In this program we are provided with a single audio signal, where we are unaware of the total number of independent signals that make up the channel. In order to find them , the input vector is delayed N times which is then given as input to the traditional FastIca algorithm. The output from the FastICA are then clustered after finding its FFT in order to find the impulse response of the FIR filters so as to recover the initial independent signals.

Audio Signal Acquisition

Single channel audio with mixed signals is acquired from a microphone with preferably higher sampling rate which is 44.1kHz in our experiment. It is then brought into the programming environment which is MATLAB in our experiment after converting into WAV file. The FFT of original signal is plotted and fundamental frequency is observed.

Independent Component Analysis

The independent component analysis model is constructed by the following:

S=Ax (1)

Here S corresponds to the mixed signals (which forms the input in our case), x corresponds to the vector of independent signals (which makes up the input signal) and A corresponds to the mixing matrix(which is used to mix the various independent signals so as to get the input signals). The basic concept of ICA lies in finding the W matrix which can give us the independent signals from the input mixed signals using the following model:

x=WS (2)

Where W matrix corresponds to the inverse of the A matrix. Independent component analysis is a method which aims at calculating the mixing matrix A and its inverse by maximising the non gaussianity of the signal. The SCICA is an extension of the ICA algorithm in order to find the source signals from only one mixed signal.

Single Channel Independent Component analysis

The first step in SCICA is to delay the signal by N times in order to find the mixing matrix A. The delay is done by the following model:

X = [Xn, Xn-1 ..... Xn-N+1]^T (3)

Where n is the length of the initial vector. Once we have the X matrix with N delays, we proceed with the FastICA algorithm in order to get the A matrix and its inverse. The total number of delays that is N is a hyperparameter which is decided by trial and error. In our experiment, the best and similar results were found when N was kept in the range of 20-30. We have arbitrarily chosen 25 as the number of delays N for this paper. The fourier transform of these delayed signals form the basis of the reconstructional FIR filters (which will be used for the recovery of the independent signals).

The above figure shows that the FFT of the various delayed signals are not all different but there is some kind of similarity between them. Therefore it is necessary to group the similar waveforms together for the formation of the filters

Clustering

The output of the FastICA algorithm is then fourier transformed using the FFT function of matlab. The next step is to cluster these waveforms using the K-means cluster algorithm which is an algorithm used to cluster waveforms that are similar to each other. The total number of clusters(which is a hyperparameter) was decided using the Davies Bouldin index which can easily be found using a predefined command in MATLAB. The Davies Bouldin index is used to assess the quality of the clusters by calculating the distance between the various clusters and by the inter cluster relationship.

From the above figure is used as a basis to select the total number of clusters(which is a hyperparameter hence has to be chosen via trial and error) . In our case as we can see from the graph, the number of clusters equal to 4 will give us the best result.

Future Work

Future work involves the reorganisation of the mixing matrix A and its inverse in order to form the reconstructional filters to find the independent components that makes up the original input signal. The proposed algorithm involves the reorganisation of the matrices into submatrices on the basis of the clusters formed using the K-Means algorithm. The impulse response for each of the reconstructional FIR filters can be found using the following equation:

fi=1/N*(∑ Ai(-t)*Wi(t)) (4)

Here , Ai and Wi are the columns and rows of the mixing matrix and its inverse respectively of a particular cluster. Ai(-t) represents that the column of the A matrix needs to be flipped before the convolution.The above equation is used to find the impulse response for all the clusters. Therefore,the total number of filters is equal to the total number of clusters found. Being FIR filters, they will have a phase delay associated with them which we need to remove in order to have zero phase filters. The formed filters are then be convolved with the original input signal in order to find the independent signals that make up the original signals.

Results

The given model was implemented with various audio samples. Delaying of signal followed by feeding it as the input matrix to the FastICA , finding the Davies Bouldin index and clustering have been successfully implemented (for N=25) for different audio samples and were found to give the expected results. For all the audio samples N in the range of 20-30 was found to give the best results.

Discussion on results

It is necessary to carefully select the hyperparameters which are the number of delays N and the number of clusters in our case. The number of delays determines the order of the FIR filters which are the reconstruction filters in our case. Higher order filters are preferred as we are working in a close range of frequency (i.e 20 Hz to 20kHz). It is found via trial and error that the best results are obtained when N is limited between 20 to 30. The other hyperparameter which is the total number of delays is chosen via the Davies Bouldin index. A careful selection of these hyperparameters are necessary in order to form reconstructional filters that are closest to the frequency response of the original independent signals.

Conclusion

The limitation of the given model is that the signals have to be non gaussian signals otherwise the proposed model will be observed to fail. Also the proposed algorithm can be used to obtain only those signals that do not overlap in the frequency domain hence they can be used for very specific situations of signal isolation and its recovery.

As the audible audio range is limited between 20Hz to 20KHz hence it is necessary to have FIR filters with higher orders in order to distinguish between different frequencies in this close range. Since the order of filters are directly related to the number of delays hence N should be selected greater than or equal to 20 .

The final output of the proposed algorithm is expected to have reduced amplitude and some sort of phase delay being associated with it as the original signal is passed through filter. Therefore this algorithm does not preserve the amplitude and the phase of the original signal.

References

[2] Hyvärinen and E. Oja. Independent Component Analysis: Algorithms and Applications. Neural Networks, 13(4-5):411-430, (2000).