Research

Machine Learning for Healthcare Informatics

The care for extremely ill patients is provided in the intensive care unit (ICU), where interventions are basically based in the abnormal vital signs. A patient is continuously monitored in the ICU so that the physiological deviations are detected as soon as possible, which will help to stabilize the patients. Upon surviving the ICU stay the patients are discharged to a general ward, where they are still continuously monitored. The earlier detection of the physiological deterioration in these wards has been shown to reduce the ICU readmission rate. These deteriorations are detected using a score that is calculated continuously using the vital sign data recorded in electronic health records (EHRs). However, the vital sign data recorded in EHRs is noisy, sparse and incomplete, specially in general non-critical wards. We are working on a Bayesian neural network framework to estimate the missing values in EHRs data which can further help in better patient outcomes.

Deep Sparse Representation for Speech Recognition

Features derived using sparse representation (SR) based approaches have been shown to yield promising results for speech recognition tasks. In most of the approaches, the SR corresponding to speech signal is estimated using a dictionary, which could be either exemplar based or learned. However, a single level decomposition may not be suitable for the speech signal, as it contains complex hierarchical information about various hidden attributes. In this work, we propose to use a multi level decomposition (having multiple layers), also known as the deep sparse representation (DSR), to derive a feature representation for speech recognition. Instead of having a series of sparse layers, the proposed framework employs a dense layer between two sparse layers, which helps in efficient implementation. Our studies reveal that the representations obtained at different sparse layers of the proposed DSR model have complimentary information.

Illustration of the proposed DSR model

3-D t-SNE representation corresponding to SR obtained at first three sparse layers of the proposed DSR model, for the E-set test set. It can be seen that the frames corresponding to different pairs are less overlapping at layers one, two and three, respectively

Relevant Publication

  • P. Sharma, V. Abrol, and A. K. Sao, "Deep sparse representation based features for speech recognition", IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 25, pp. 2162-2175, 2017. PDF

Sparse Representation for Speech Units Classification

In this work, we propose sparse representation based features for speech units classification tasks. In order to effectively capture the variations in a speech unit, the proposed method employs multiple class specific dictionaries. Here, the training data belonging to each class is clustered into multiple clusters, and a principal component analysis (PCA) based dictionary is learnt for each cluster. It has been observed that coefficients corresponding to middle principal components can effectively discriminate among different speech units. Exploiting this observation, we propose to use a transformation function known as weighted decomposition (WD) of principal components, which is used to emphasize the discriminative information present in the PCA-based dictionary.

Illustration of proposed (a) dictionary learning, and (b) dictionary selection method.

Two-dimensional t-SNE visualization of data along with cluster centroids for (a) /ba/, (b) /bA/ and (c) /no/ class. These visualizations are corresponding to 400-dimensional raw speech samples.

Relevant Publications

  • P. Sharma, V. Abrol, A. D. Dileep and A. K. Sao, "Sparse coding based features for speech units classification", , Computer Speech and Language, Elsevier, vol. 47, pp. 333-350, 2018. PDF
  • P. Sharma, V. Abrol, A. D. Dileep and A. K. Sao, "Sparse coding based features for speech units classification", in Proc. ISCA 16th Annual Conference (INTERSPEECH), 2015, Dresden, Germany. PDF

Sparse Representation for Speech Enhancement

Supervised approaches for speech enhancement require models to be learned for different noisy environments, which is a difficult criterion to meet in practical scenarios. In this work, sparse representation (SR)/compressed sensing (CS) based supervised speech enhancement approach is proposed, where model (dictionary) for noise is derived from the noisy speech signal. It exploits the observation that unvoiced/silence regions of noisy speech signal will be predominantly noise and a method is proposed to measure the same, thus eliminating pre-training of noise model. The proposed method is particularly effective in scenarios where noise type is unknown a priori. Experimental results validate that the proposed approach can be an alternative to the existing approaches for speech enhancement

Clean speech, noisy speech (corrupted with babble noise 0dB), speech enhanced using existing unsupervised CS based method and enhanced using proposed method (when Ψ n is learned from noisy speech) are shown in (a), (c), (e) and (g), espectively with their corresponding spectrograms shown in (b), (d), (f) and (h).

Relevant Publications

  • P. Sharma, V. Abrol and A. K. Sao, "Supervised speech enhancement using compressed sensing", in Proc. 21st IEEE National Conference on Communication (NCC), 2015, Mumbai, India. PDF
  • V. Abrol, P. Sharma and A. K. Sao, "Speech enhancement using compressed sensing", in Proc. ISCA 14th Annual Conference (INTERSPEECH), August, 2013, Lyon, France. PDF

SR/CS for footprint reduction of unit selection based text-to-speech systems

In this work, we have explored the framework of compressed sensing (CS) and sparse representation (SR) to reduce the footprint of unit selection based speech synthesis (USS) system. In the CS based framework, footprint reduction is achieved by storing either CS measurements or signs of CS measurements, instead of storing the raw speech waveforms. For efficient reconstruction using CS measurements, the speech signal should have a sparse representation over a predefined basis/dictionary. Hence, in this work, we have also studied the effectiveness of sparse representation for compressing the speech waveform. To further increase compression in SR based framework of footprint reduction, the significant coefficients of sparse vector are selected adaptively, based on the type of speech segment (e.g., voiced, unvoiced etc.). Experimental studies on two different Indian languages suggest that CS/SR based footprint reduction methods can be used as an alternative to existing compression methods employed in USS system.

Illustration of proposed footprint reduction methods. Compression is achieved by storing (a) reduced number of measurements (FRCS), (b) sign of measurements corresponding to estimated sparse vector (FRCS1), and (c) significantcoefficients of estimated sparse vector (FRSV).

Relevant Publications

  • P. Sharma, V. Abrol, Nivedita and A. K. Sao, "Reducing footprint in unit selection based text-to-speech system using compressed sensing and sparse representation'', Computer Speech and Language, Elsevier, 2018. PDF
  • P. Sharma, V. Abrol and A. K. Sao, "Compressed sensing for unit selection based Speech Synthesis", in Proc. 23rd European Signal Processing Conference (EUSIPCO), 2015, Nice, France. PDF
  • P. Sharma, V. Abrol and A. K. Sao, "Learned dictionaries for sparse representation based unit selection speech synthesis", in Proc. 22nd IEEE National Conference on Communication (NCC), 2016, Guwahati, India. PDF