Nithish M. Selvaraj
Research Associate, ROSE Lab,
Nanyang Technological University (NTU), Singapore.
I'm an academic researcher from Nanyang Technological University, Singapore. My core area of research is multimodal machine learning, specializing in vision, audio, and language modalities. Current research focuses on "Trustworthy Generative AI", with emphasis on Explainability (developing interpretable models with natural language) and Hallucinations (detection and mitigation strategies). Previously, I was working on a construction robotics project, where I developed vision systems and algorithms for progress monitoring mobile-robots.
Research
Towards Trustworthy Generative AI (Current Work)
In this project, I focus on building Interpretable AI models that are explainable with natural language concepts and understanding the causes of "multimodal hallucinations" in Large Vision-Language Models (LVLMs). I also focus on developing agorithms to detect / mitigate hallucinations in LVLMs and construct benchmark datasets to evaluate these models.
DigiSup: Mobile Robot based Visual Inspection and Progress Monitoring System
In this project, we developed a semi-autonomous mobile robot to monitor the progress of “installable” components (like lights, switches, etc.) in HDB flats in Singapore. The robot navigates unit-by-unit, scans the surroundings with a 360-degree camera, and uses object detectors to estimate the progress. It also involves developing visual inspection algorithms for defect analysis and workplace safety checks.
New!! Improving Concept Alignment in Vision-Language Concept Bottleneck Models
Paper Page | Code | Arxiv
We investigate the efficacy of VLM concept scores and find that CLIP models struggle to correctly associate concepts.
We propose a novel Contrastive Semi-Supervised (CSS) method to improve concept alignment in Vision-Language Concept Bottleneck Models (VL-CBM).
We also introduce a class-level intervention procedure for fine-grain classification problems.
Adapter Incremental Continual Learning of Efficient Audio Spectrogram Transformers
Muthuchamy Selvaraj, N., Guo, X., Kong, A., Shen, B., Kot, A. (2023) Adapter Incremental Continual Learning of Efficient Audio Spectrogram Transformers. Proc. INTERSPEECH 2023, 909-913, doi: 10.21437/Interspeech.2023-1189
We propose "Convolutional Adapters" for Task Incremental Continual Learning (TICL) of Audio Spectrogram Transformers (AST).
We also introduce a novel attention mechanism for AST called Frequency-Time factorized Attention (FTA).
Audio-Visual Deception Detection: DOLOS Dataset and Parameter-Efficient Crossmodal Learning
*Xiaobao Guo, *Nithish Muthuchamy Selvaraj, Zitong Yu, Adams Wai-Kin Kong, Bingquan Shen, Alex Kot; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 22135-22145
We release "DOLOS" - the largest gameshow-based Multimodal Deception Detection dataset.
We propose Parameter Efficient Crossmodal Learning (PECL) with Uniform Temporal Adapter (UTA) for a Wav2Vec2-ViT multimodal model.
Platform-independent visual installation progress monitoring for construction automation
X Zhao, Y Jin, NM Selvaraj, M Ilyas, CC Cheah - Automation in Construction, 2023
This work integrates various robotic platforms (mobile-robots, quadrupeds, drones), object detectors, and BIM (Building Information Model) under one roof for progress monitoring in commercial and residential complexes.
Robot-assisted object detection for construction automation: data and information-driven approach
M Ilyas, HY Khaw, NM Selvaraj, Y Jin, X Zhao… - IEEE/ASME Transactions on Mechatronics, 2021
We propose a robotic system to help construction supervisors remotely identify the construction materials, detect component installations and defects, and generate report of their status and location.
Method and system for inspecting a building construction site using a mobile robotic system.
Patent for the DigiSup (Digital Supervisor) project.