Past Research Projects

Pathological voice classification:

The objective was to extract features that enable accurate discrimination between Pathological voice and normal voice. Signal decomposition approach based on matching pursuit is tried and it has showed encouraging results for classifying pathological subgroups (Adductor, keratosis, paralysis, vocal polyps and vocal folds). Also, redefinition of all the basic/conventional features like jitter, shimmer, etc used for pathological voice classification is done using novel pitch detection algorithm based on micro-canonical multi-scale formalism and is compared with PRAAT. The new pitch detection algorithm based features have shown good classification score than PRAAT. Along with the conventional features, three new features have been developed, which shows greater classification score than the conventional features.

Publications:

Ashwini Jaya Kumar and Khalid Daoudi, “Discrimination between pathological voice categories using Matching Pursuit,” IEEE 4th International work conference on Bioinspired Intelligene (IOWBI' 15), Donostia, Spain June 10-12, 2015. [pdf] [IEEE Xplore]

Khalid Daoudi and Ashwini Jaya Kumar, “Pitch-based speech perturbation measures using a novel GCI detection algorithm: Application to pathological voice classification,” Interspeech'15, Dresden, Germany, Sep, 2015. [paper]

Text-to-Speech Synthesis:

Worked on Unit Selection based text to speech synthesis (TTS). The main objective was to improve the overall quality of synthesized speech – from unit selection in the database to prosody modeling of the synthesized speech. In the course tried many signal processing techniques and language processing rules to improve the quality of synthesized speech. Focus was mainly on two indian languages: Kannada and Tamil. MILE TTS is now integrated with the Optical Character Recognition to help the visually challenged and couple of Braille school is already using the first version of MILE TTS. Entry is made to Blizzard Challenge 2013 with two Indic Languages TTS with a team of four members.

Experience MILE TTS at: http://mile.ee.iisc.ernet.in:8080/tts_demo/

Publications:

Shiva Kumar H R, Ashwini J K, Rajaram B S R, Ramakrishnan A G, “MILE TTS for Tamil and Kannada” Blizzard Challenge 2013.[pdf]

http://www.festvox.org/blizzard/blizzard2013.html

Distant speech recognition:

Isolated speaker independent digit recognizer is built using Ti-digit database, HTK tool and Matlab (2009a) (LPCC, MFCC are used as features and VQ, HMM are used as classifiers). A detailed literature review is done on the effects of acoustic environment (reverberation) under six categories - Signal based approach; Spectral enhancement approach; Feature enhancement approach; LP residual enhancement approach; Decoder based approach and Multi-stage approach. A new single channel dereverberation algorithm named long term linear prediction (LTLP) is developed based on linear prediction and non linear spectral subtraction, which can enhance speech data of length less than 0.5s. A two channel approach with LTLP at preprocessing stage is tested. The approach required accurate estimation of time delay and hence the performance of LTLP was good for single channel enhancement.

Publications:

M.Tech Thesis (Implementation was done in MATLAB and codes will be available on request)

Jaya Kumar Ashwini, Ramaswamy Kumaraswamy, “Speech Enhancement Techniques for Distant Speech Recognition”, Journal of Intelligent Systems. Volume 0, Issue 0, Pages 1–13, ISSN (Online) 2191-026X, ISSN (Print) 0334-1860, DOI: 10.1515/jisys-2012-0051, May 2013. [pdf]

http://www.degruyter.com/view/j/jisys.2013.22.issue-2/jisys-2012-0051/jisys-2012-0051.xml

J K Ashwini, R. Kumaraswamy, “Suppressing unwanted reflections smeared in the distant speech signal using long term linear prediction”, Proc. of International conference on Communication, VLSI and Signal Processing - 2013, pp 171-176, Feb 20-22, 2013, Tumkur, India. [pdf]