Karan Singla

I'm a researcher, engineer, and part-time soccer player passionate about advancing speech signal processing.

As the CEO and Founder of Whissle, we focus on real-time methods for modular and cost-agnostic multi-modal understanding. I earned my PhD from USC under Shrikanth Narayanan and have contributed to research at AT&T Bell Labs’ Interactions LLC.

My work spans NLP, acoustic-prosodic cues, and speech-based methods, and I enjoy mentoring students on AI/ML projects through initiatives like Google Summer of Code and RedHenLab.

Check Google Scholar for list of recent research works.

Highlights

Visual Information can help with better audio understanding Paper-link

E2E ASR's optimized with CTC loss can infer Intent and transcribe in 1-step. Paper-link
here is a sample tape with caller-agent conversation Video-link

Using seq-2-seq for extracting names. Interspeech 2022
s as in sam k as in kipe i as in ina ia b as in boy o as in over --> skibo Paper-link

^^ In recent EMNLP 2023 paper, we propose theoretical extensions to CTC loss
We show speech encoders can learn to generate only entity relevant tokens. Paper-link

Prototype of multilingual Video-2-Video conversation platform.

Demo / Blog. Text based translation of TV-news git

PhD Thesis Link

Transcription-free behavior coding in psychotherapy.
ACL 2020 paper.

Acoustic-prosodic information can aid lexical text for improved human behavior and spoken language understanding
GSOC Blog, Interspeech 2018 paper ICASSP 2019 paper.

Went to Danker lake in 2014. Picture here.

Tiny (64 dimensional) LSTM-based seq-2-seq chatbot using your own facebook chat data link.

Masters thesis link.

Resume

ksingla@whissle.ai

Google Sites

Report abuse