Nithish M. Selvaraj
Research Associate, ROSE Lab,
Nanyang Technological University (NTU), Singapore.
ms <dot> nithish [at] ntu <dot> edu <dot> sg
Research Associate, ROSE Lab,
Nanyang Technological University (NTU), Singapore.
ms <dot> nithish [at] ntu <dot> edu <dot> sg
Towards Trustworthy Generative AI (Current Work)
In this project, I focus on building Interpretable AI models that are explainable with natural language concepts and understanding the causes of "multimodal hallucinations" in Large Vision-Language Models (LVLMs). I also focus on developing agorithms to detect / mitigate hallucinations in LVLMs and construct benchmark datasets to evaluate these models.
DigiSup: Mobile Robot based Visual Inspection and Progress Monitoring System
In this project, we developed a semi-autonomous mobile robot to monitor the progress of “installable” components (like lights, switches, etc.) in HDB flats in Singapore. The robot navigates unit-by-unit, scans the surroundings with a 360-degree camera, and uses object detectors to estimate the progress. It also involves developing visual inspection algorithms for defect analysis and workplace safety checks.
We investigate the efficacy of VLM concept scores and find that CLIP models struggle to correctly associate concepts.
We propose a novel Contrastive Semi-Supervised (CSS) method to improve concept alignment in Vision-Language Concept Bottleneck Models (VL-CBM).
We also introduce a class-level intervention procedure for fine-grain classification problems.
Muthuchamy Selvaraj, N., Guo, X., Kong, A., Shen, B., Kot, A. (2023) Adapter Incremental Continual Learning of Efficient Audio Spectrogram Transformers. Proc. INTERSPEECH 2023, 909-913, doi: 10.21437/Interspeech.2023-1189
We propose "Convolutional Adapters" for Task Incremental Continual Learning (TICL) of Audio Spectrogram Transformers (AST).
We also introduce a novel attention mechanism for AST called Frequency-Time factorized Attention (FTA).
*Xiaobao Guo, *Nithish Muthuchamy Selvaraj, Zitong Yu, Adams Wai-Kin Kong, Bingquan Shen, Alex Kot; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 22135-22145
We release "DOLOS" - the largest gameshow-based Multimodal Deception Detection dataset.
We propose Parameter Efficient Crossmodal Learning (PECL) with Uniform Temporal Adapter (UTA) for a Wav2Vec2-ViT multimodal model.
X Zhao, Y Jin, NM Selvaraj, M Ilyas, CC Cheah - Automation in Construction, 2023
This work integrates various robotic platforms (mobile-robots, quadrupeds, drones), object detectors, and BIM (Building Information Model) under one roof for progress monitoring in commercial and residential complexes.
M Ilyas, HY Khaw, NM Selvaraj, Y Jin, X Zhao… - IEEE/ASME Transactions on Mechatronics, 2021
We propose a robotic system to help construction supervisors remotely identify the construction materials, detect component installations and defects, and generate report of their status and location.
Patent for the DigiSup (Digital Supervisor) project.