Check Google Scholar for list of recent research works.
Highlights
Visual Information can help with better audio understanding Paper-link
E2E ASR's optimized with CTC loss can infer Intent and transcribe in 1-step. Paper-link
here is a sample tape with caller-agent conversation Video-link
Using seq-2-seq for extracting names. Interspeech 2022
s as in sam k as in kipe i as in ina ia b as in boy o as in over --> skibo Paper-link
^^ In recent EMNLP 2023 paper, we propose theoretical extensions to CTC loss
We show speech encoders can learn to generate only entity relevant tokens. Paper-link
Prototype of multilingual Video-2-Video conversation platform.
Demo / Blog. Text based translation of TV-news git
PhD Thesis Link
Transcription-free behavior coding in psychotherapy.
ACL 2020 paper.
Acoustic-prosodic information can aid lexical text for improved human behavior and spoken language understanding
GSOC Blog, Interspeech 2018 paper ICASSP 2019 paper.
Went to Danker lake in 2014. Picture here.
Tiny (64 dimensional) LSTM-based seq-2-seq chatbot using your own facebook chat data link.
Masters thesis link.