I am a reseacher, engineer and a part-time soccer player.

I am the CEO and Founder of Whissle,  which aims to enable real-time signal processing  for media.  Vision is to democratize  speech signal processing capabilities  while reducing human-in-the-loop annotation costs. 

I was a scientist at Interactions LLC (research group from AT&T Bell Labs) 2021-2023 and recieved my PhD from University of Southern California,  advised by Shrikanth Narayanan (SAIL lab). Before that, I did a masters-by-research (including undergrad.) in Computational Linguistics at IIIT-Hyderabad advised by Dipti Misra Sharma.

My research lies in natural language processing, use of acoustic-prosodic cues and more recently enabling speech-based methods for traditional NLP tasks. 

Every summer I mentor students on various AI/ML to process T.V News as part of Google-summer-of-code and RedHenLab.

Check Google Scholar for list of recent research works.

Highlights

E2E ASR's  optimized with CTC loss can infer Intent and transcribe in 1-step. Paper-link
here is a sample tape with caller-agent conversation Video-link
 


Using seq-2-seq for extracting names. Interspeech 2022
s as in sam k as in kipe i as in ina ia b as in boy o as in over --> skibo Paper-link

^^ In recent EMNLP 2023 paper, we propose theoretical extensions to CTC loss
We show speech encoders can learn to generate only entity relevant tokens. Paper-link


Prototype of multilingual Video-2-Video conversation platform. 

Demo / Blog. Text based translation of TV-news git


PhD Thesis Link

Transcription-free behavior coding in psychotherapy.
ACL 2020 paper.

Acoustic-prosodic information can aid lexical text for improved human behavior and spoken language understanding
GSOC Blog, Interspeech 2018 paper ICASSP 2019 paper.


Went to Danker lake in 2014. Picture here.


Tiny (64 dimensional) LSTM-based seq-2-seq chatbot using your own facebook chat data link.


Masters thesis link.


karan@whissle.ai