I am a Speech AI researcher with over 10 years of experience in speaker verification, spoken language identification, diarization, deepfake detection, and speech technologies, including automatic speech recognition (ASR) for low-resource languages. My research applies deep learning, machine learning, and signal processing to solve real-world challenges in speech technology.
I currently work as a Postdoctoral Researcher (since January 2024) in the Computational Speech Group, led by Prof. Tomi Kinnunen at the School of Computing, University of Eastern Finland. Earlier, I served as a Senior Project Officer (SPO) in the “Bhasini: Speech Technologies in Indian Languages” project of the Ministry of Electronics and Information Technology (MeitY), Government of India (April 2022 – December 2023).
I earned my Ph.D. in May 2024 from the Department of Electrical, Electronics, and Communication Engineering (EECE) at the Indian Institute of Technology Dharwad (IIT Dharwad). My doctoral thesis, titled “Implicit Systems for Spoken Language Diarization,” focused on developing novel frameworks for language diarization in low/zero resource settings, to enable speech technology in code-switched scenarios.
I specialize in speech signal processing, tackling research challenges that advance the development and deployment of speech-based applications such as speech recognition, speaker recognition, and voice translation. Most of my work focuses on building robust speaker and language recognition systems. I enjoy analyzing speech and biomedical signals, developing innovative speech-based systems using signal processing and deep learning, interpreting results, identifying limitations, and creating effective solutions.
Beyond research, I strive to help those in need and explore spiritual concepts related to human neurocognitive activity. I am fascinated by ancient claims that specific sequences of sounds can influence mental states and even creation itself. In the future, I aim to expand my research while exploring the scientific connections between sound, cognition, and consciousness.
नादेन विहितं सर्वं
नादः सर्वस्य कारणम् ॥
(Everything is established by Nāḍa (sound/vibration), Nāḍa is the
cause of everything.)
Nāda–Bindu Upaniṣad, Atharva Veda
Speech Signal Processing
Speaker and Language Identification/Verification/Diarization
Explanability in Anti-Spoofing, Source Tracing
Spoofing-aware Speaker Recognition
Kinship Recognition From Speech
Spoken LLM
Automatic Speech Recognition
Development of Speech Technologies in Low and Under-Resourced Settings
Deep Learning Framework Development Inspired by Signal Processing and Domain Knowledge
Humans' Nuro-Cognitive Activity Analysis
Development of Signal Processing Tools to Analyse the Fine Structure of the Signal
Technical Skills
Languages: Python, bash
Tools: MATLAB, Pytorch, Keras, Git, Slurm, Docker
Speech/Audio: Kaldi, librosa, pyannote
Domains: Speaker Verification, Deepfake Detection, SSL Models, Language/Speaker Diarization, ASR