I am a reseacher, engineer and a part-time soccer player.
I am the CEO and Founder of Whissle, which aims to enable real-time signal processing for media. Vision is to democratize speech signal processing capabilities while reducing human-in-the-loop annotation costs.
I was a scientist at Interactions LLC (research group from AT&T Bell Labs) 2021-2023 and recieved my PhD from University of Southern California, advised by Shrikanth Narayanan (SAIL lab). Before that, I did a masters-by-research (including undergrad.) in Computational Linguistics at IIIT-Hyderabad advised by Dipti Misra Sharma.
My research lies in natural language processing, use of acoustic-prosodic cues and more recently enabling speech-based methods for traditional NLP tasks.
Every summer I mentor students on various AI/ML to process T.V News as part of Google-summer-of-code and RedHenLab.
Check Google Scholar for list of recent research works.
E2E ASR's optimized with CTC loss can infer Intent, Emotion, Transcribe, Speaker identity etc etc.
here is a sample tape with caller-agent conversation Video-link
Using seq-2-seq for extracting names. Interspeech 2022
s as in sam k as in kipe i as in ina ia b as in boy o as in over --> skibo Paper-link
^^ In recent EMNLP 2023 paper, we propose theoretical extensions to CTC loss
We show speech encoders can learn to generate only entity relevant tokens.
Prototype of multilingual Video-2-Video conversation platform.
Transcription-free behavior coding in psychotherapy.
ACL 2020 paper.
Acoustic-prosodic information can aid lexical text for improved human behavior and spoken language understanding
GSOC Blog, Interspeech 2018 paper ICASSP 2019 paper.
Tiny (64 dimensional) LSTM-based seq-2-seq chatbot using your own facebook chat data link.
Masters thesis link.