Research Interests

The focus of my research is primarily in the domain of Computer Vision, Multimodal Signal Processing, along with the design or application of Machine Learning techniques for tasks such as classification, synthesis, retrieval, and reasoning. I have also explored Acoustic Signal Processing and Natural Language Processing techniques for recognition tasks, as a part of research in this field.

The application use-cases for my research have a wide range, starting from achieving a better understanding of human communication dynamics, to designing efficient search systems, assisting people with physical impairments, and many more.

On the Computer Vision front, my research has focused on:

Designing models for rendering 3D scenes from novel views.
Designing embodied agents that can navigate in an environment using visual cues.
Developing effective ranking functions for image and video retrieval.
Re-identification of humans from videos captured in the “wild”.
Automatic facial expression recognition or synthesis.

In Multimodal Machine Learning, my work has centered around:

Synthesizing audio or visual signals from other modalities.
Developing techniques for combining cues from multiple modalities, i.e. audio, video and text for performing recognition and synthesis.
Capturing physiological signals, such as heart rate, through non-invasive techniques.
Understanding the effect of psychological distress on human communication dynamics.
Analyzing speaker traits of online movie reviewers.

In Acoustic Signal Processing and Natural Language Processing, I've primarily worked on:

Performing effective speech recognition for analyzing speaker traits of online movie reviewers.
Developing Language Models and methods to ensure tractability of the system, e.g. dictionary size.