DNN Node Pruning

In our approach we use an entropy-based measure to understand the significance of each of the nodes in the DNN classification for a given attribute/factor. The attributes could be phones, speaker or noise. In our preliminary experiments, we have observed that the nodes which give low entropy measure using phone attributes can be removed from the DNN without much loss in the ASR performance. This is a general framework and can be extended in using other factors/attributes such as speaker and noise. (Kaldi Code)

Non-segmental DTW

One of the contributions of my thesis is to investigate the use of a variant of DTW-based algorithm referred to as non-segmental DTW and analyse the search performance using Gaussian posteriorgrams of acoustic features such as MFCC, FDLP, etc. and model based features such as probabilistic and bottleneck features. I have been participating and building query-by-example spoken term detection systems for MediaEval challenge from 2011-2014. (GPU Code)


Information Retrieval-based DTW algorithm is a dynamic programming implementation that can be used to search for similar series of real-valued vectors (queries) within some reference (spoken audio) data. Tree IR-DTW uses a hierarchical k-means based indexing algorithm for the retrieval of spoken audio data. Tree IR-DTW was used as a baseline system for the MediaEval 2013 challenge. (Python Code)

Speech Python (SPy)

A python package written for speech processing. Some of the major functionalities available are zero frequency filtering, identifying glottal closure instants, voice and unvoiced region detection in noisy environments. (Python Code)