Siddarth Chandrasekar

Post-Bacc @ RBCDSAI, IIT Madras


LinkEmailGitHubLink

I am a Post Baccalaureate at the Robert Bosch Centre for Data Science and Artificial Intelligence at the Indian Institute of Technology - Madras. I work on Reinforcement Learning (RL) under the supervision of Prof. Ravindran. My long-term goal is to build and deploy autonomous agents that interact with and learn from the real world.


My current research interests lies in the intersection of RL and representation learning. Currently, I am working on making Goal-Conditioned HRL more sample efficient.


Before joining RBCDSAI, I worked in the SPIRE lab at IISc Bangalore, where I focused on speech processing, particularly Acoustic to Articulatory Inversion (AAI). Additionally, I also worked at the intersection of federated learning and NLP at IIT Patna.

I completed my undergrad at IIIT-DM in Electronics and Communication Engineering.

Publications

Conferences

Automatic detection of consumers’ complaints about items or services they buy can be critical for organizations and online merchants. Previous studies on complaint identification are limited to text. Images along with the reviews can provide cues to identify complaints better, thus emphasizing the importance of incorporating multi-modal inputs into the process. Generally, the customer’s emotional state significantly impacts the complaint expression; thus, the effect of emotion and sentiment on complaint identification must also be investigated. Furthermore, different organizations are usually not allowed to share their privacy-sensitive records due to data security and privacy concerns. Due to these issues, traditional models find it hard to understand and identify complaint patterns, particularly in the financial and healthcare sectors. In this work, we created a new dataset - Multi-modal Complaint Dataset (MCD), a collection of reviews and images of the products posted on the website of the retail giant Amazon. We propose a federated meta-learning-based multi-modal multi-task framework for identifying complaints considering emotion recognition and sentiment analysis as two auxiliary tasks. Experimental results indicate that the proposed approach outperforms the baselines and the state-of-the-art approaches in centralized and federated meta-learning settings.

Traffic signal control is an important problem in urban mobility with a significant potential for economic and environmental impact. While there is a growing interest in Reinforcement Learning (RL) for traffic signal control, the work so far has focussed on learning through simulations which could lead to inaccuracies due to simplifying assumptions.  Instead, real experience data on traffic is available and could be exploited at minimal costs. Recent progress in offline or batch RL has enabled just that. Model-based offline RL methods, in particular, have been shown to generalize from the experience data much better than others. 

We build a model-based learning framework that infers a Markov Decision Process (MDP) from a dataset collected using a cyclic traffic signal control policy that is both commonplace and easy to gather. The MDP is built with pessimistic costs to manage out-of-distribution scenarios using an adaptive shaping of rewards which is shown to provide better regularization compared to the prior related work in addition to being PAC-optimal. Our model is evaluated on a complex signalised roundabout and a large multi-intersection environment, demonstrating that highly performant traffic control policies can be built in a data-efficient manner.

This study analyzes formant transitions in six English stop-consonants in vowel-consonant-vowel (VCV) sequences. We investigate whether natural speech preserves formant patterns, and if not, how it affects stop-consonant perception and automatic classification. We specifically ask three questions: 1) To what extent these formant transition patterns are preserved in naturally produced VCV sequences? 2) If not preserved, does it have any effect on the perception of the stop-consonant? 3) How does the classification of stop-consonants by automatic classifiers change when formant transition patterns are not preserved? We found that 33.56% of the corpus deviate from the formant transition pattern. The perception test reveals an Unweighted Average Recall (UAR) of 91.97% in identifying the stop-consonants in the VCV sequences when the pattern is not preserved compared to 93.54% when it is preserved. The best UAR from an automatic classifier is 68.35% and 77.5% in these two cases, respectively.

In this work, we investigate the effectiveness of pretrained Self-Supervised Learning (SSL) features for learning the mapping for acoustic to articulatory inversion (AAI). Signal processing-based acoustic features such as MFCCs have been predominantly used for the AAI task with deep neural networks. With SSL features working well for various other speech tasks such as speech recognition, emotion classification, etc., we experiment with its efficacy for AAI. We train on SSL features with transformer neural networks-based AAI models of 3 different model complexities and compare its performance with MFCCs in subject-specific (SS), pooled and fine-tuned (FT) configurations with data from 10 subjects, and evaluate with correlation coefficient (CC) score on the unseen sentence test set. We find that acoustic feature reconstruction objective-based SSL features such as TERA and DeCoAR work well for AAI, with SS CCs of these SSL features reaching close to the best FT CCs of MFCC. We also find the results consistent across different model sizes.

We present a web interface to visualize the midsagittal plane of the human mouth during speech. Given an articulated sentence, we estimate the corresponding articulatory trajectories and visualize the same. This web interface provides a comprehensive view of the articulators’ trajectories and could serve as an important tool for speech training.

Journals

Prior study on automatically identifying complaints on social media has relied on extensive feature engineering in centralized settings, with no consideration for the decentralized, non-identically independently distributed (Non-IID), and privacy-conscious aspect of complaints, which can hinder data collection, distribution, and learning. In this work, we propose a Graph Attention Network (GAT) based multi-task framework that intends to learn two closely related tasks, complaint detection (primary task) and sentiment classification (auxiliary task), simultaneously in federated-learning scenarios. We propose the Federated Combination (FedComb) algorithm, a two-sided adaptive optimization technique that simultaneously optimizes global and local models. The proposed methodology outperforms several baselines for the intended task of recognizing complaints in decentralized settings, according to quantitative and qualitative studies on two benchmark datasets. The resources are available at https://github.com/appy1608/IEEE-TAI_FedCI_GAT.