Research
Research
I work in the field of speech processing, with an emphasis on leveraging deep learning and machine learning techniques. My core interest lies in building practical speech solutions for Indian languages, especially those that are low-resourced or underrepresented. One of the key areas I have worked is diarization of informal multilingual conversations, focusing on identifying both spoken languages and speakers. This is a contemporary area of research, especially in a linguistically diverse country like India, where language and dialect continua pose unique challenges. In this context, I have also worked on dialect identification, including studies in Ao, a low-resourced language from Na galand state of India.
I am also enthusiastic about cross-disciplinary research, particularly at the intersection of speech and biomedical signals. I am keen to explore how speech data can be combined with EEG, ECG, and visual modalities to support applications in health monitoring and cognitive state analysis.
In multi-lingual societies, where multiple languages are spoken in a small geographic vicinity, informal conversations often involve mix of languages. Existing speech technologies may be inefficient in extracting information from such conversations, where the speech data is rich in diversity with multiple languages and speakers. To address these challenges, we organised two series of DISPLACE challenges, aimed at evaluating and benchmarking speaker and language diarization systems under these demanding conditions. A real-world dataset comprising multilingual, multispeaker, far-field conversational speech was curated and made publicly available to support the challenge. Further details can be found at the links below.
DISPLACE Challenge-2023: Click Here
DISPLACE Challenge-2024: Click Here
Ao is a low-resourced Tibeto-Burman language spoken in Nagaland, India, featuring three lexical tones: high, mid, and low. It has three dialects—Chungli, Mongsen, and Changki—with Chungli serving as the standard dialect, used in all written materials. In addition to that, the Changki and Mongsen speakers can also read and write in the standard dialect. Therefore, speech modelling and analysis in Ao dialects become extremely challenging due to the unavailability of resources in all of the dialects.
To address this, a Dialect Identification (DID) task was conducted to distinguish among the three Ao dialects using excitation source features such as the ILPR-Spectrogram, along with the LP-Gammatonegram, which capture dialectal variations in both time-frequency and perceptual domains.
Competitive speech refers to segments in conversations where multiple speakers attempt to take or maintain the speaking turn simultaneously, indicating a competitive intent. This work introduces an automatic framework to detect competitive speech in Indian TV news debates by leveraging cues from shouted and overlapping speech. A key contribution of this study is the creation of the Indian Broadcast News Debate (IBND) corpus, which is annotated specifically for shouted, overlapped, and competitive speech instances. To detect competitive speech, speech processing and deep learning techniques were employed, incorporating high-level semantic cues derived from shouted and overlapping speech. The study also proposes novel approaches, including an autoencoder-based system, emphasizing the role of excitation source and phase-based features in improving classification performance.