Publications
2 x {EMNLP}, 1 x {ACL, ICRA, ICASSP}
1 x {ICML-w, ICLR-w, CVPR-w, NeurIPS-w} (w - workshop), * denotes equal contribution of authors
2 x {EMNLP}, 1 x {ACL, ICRA, ICASSP}
1 x {ICML-w, ICLR-w, CVPR-w, NeurIPS-w} (w - workshop), * denotes equal contribution of authors
Evaluating Compound AI/LLM Systems through Behaviors, Not Benchmarks
EMNLP (findings) 2025
TL;DR: A behavior-driven evaluation framework that auto-generates diverse, usage-aligned test specs for Compound AI (LLM agents) and executes them via graph-based pipelines - uncovering failure modes benchmarks miss and revealing ~ 2× higher failure rates.
Bandit Router Framework: Our router takes three inputs: (i) User query, (ii) cost constraints, and (iii) a model pool. It learns and adapts automatically based on user feedback, optimizing LLM selection over time. The router is initialized by pretrained weights learned from offline human preference data.
Adaptive LLM Routing Under Budget Constraints
EMNLP (findings) 2025
TL;DR: Picking which LLM should answer each request is expensive and brittle—teams either run many models (which is costly and slow) or use static rules that break as queries change. You also rarely have a complete map of “which model is best for which query.”
We thus frame routing as an adaptive contextual bandit, learn a shared query–model embedding from human preferences, and refine it online with PILOT (a cost-aware LinUCB) plus a knapsack-style cost policy. The result: automatically pick the right model per request without exhaustive trials—cutting latency and inference costs while continuously improving answer quality.
Multi-Dimensional Improvements: Our method (with GPT-4 as reader LLM) achieves SoTA results on several datasets and multiple complex (multi-hop) QA metrics. EM: Exact-Match with the gold answer, SelfAware EM: Confidence-aware EM, BertScore: Semantic similarity between predicted and gold answer; Query Info Efficiency: Efficiency of representing query-relevant information in the supporting documents - inversely proportional to the input token count for the reader LLM.
HOLMES: Hyper-Relational Knowledge Graphs for Multi-hop Question Answering using LLMs
ACL (main) 2024
TL;DR: A simple method to crawl through unstructured (chaotic) text and derive a context-aware and query dependent Knowledge Graph (KG). This distilled KG enables LLMs to significantly boost their complex QA performance. HOLMES achieves this while utilizing up to 67% fewer tokens to represent the query relevant information present in the supporting documents, resulting in enhanced efficiency.
Performance on a challenging frame from the NPS Drones dataset. Green boxes are ground truth, and Red boxes are model predictions. a) Traditional methods uniformly scan the entire frame for drones, leading to wasted effort and missed detections in complex scenarios b) Our method precisely localizes drones using a coarse-to-fine detection approach
C2FDrone: Coarse-to-Fine Drone-to-Drone Detection using Vision Transformer Networks
ICRA 2024
TL;DR: A vision-based drone-to-drone detection system is vital for tasks like avoiding collisions, countering hostile drones, and search-and-rescue missions. Existing methods face challenges like small object sizes and real-time processing needs. The proposed approach utilizes a novel coarse-to-fine detection strategy based on vision transformers, enhancing F1 scores by up to 7% on three challenging datasets. Furthermore, the model demonstrates real-time processing capabilities when deployed on edge-computing devices.
[In Feature Attribution Setting] Explanations generated by FW-Shapley for 3 randomly selected CIFAR10 images. Each column in the figure corresponds to a CIFAR10 class and the model predictions (in probabilities) for the corresponding class is provided below each image.
FW-Shapley: Real-Time Estimation of Weighted Shapley Values
ICASSP 2024
TL;DR: Fair credit assignment is crucial in machine learning (ML), with Weighted Shapley values being a valuable tool for the same. However, they are computationally expensive for high-dimensional datasets. The proposed Fast Weighted Shapley (FW-Shapley) addresses this by introducing an efficient method based on a learned estimator, allowing practical computation of weighted Shapley values without using ground truth values during training.
Note: The two axes represent two features, the red and green half-spaces denote the true posterior distribution and the blue line represents our base model h (such as a max-margin classifier). The circles denote data samples; circles colored white denote data pts classified correctly by h and violet circles denote misclassified data points.
Interpretable Model Drift Detection
CODS-COMAD 2024, ICML 22 Workshop
TL;DR: All posterior shifts in the data distribution do not lead to degradation model performance (see Fig). The ones that do degrade are called model drift. In this work, we take a principled approach to study the problem of interpretable model drift detection from a model risk perspective using a feature-interaction aware hypothesis testing framework, which enjoys guarantees on test power.
Feature-Interpretable Real Concept Drift Detection
Workshop on Trustworthy ML, ICLR 2023
TL;DR: Real-concept drift refers to a shift in the posterior distribution (label distribution conditioned on the covariates). Detecting these shifts helps in: (i) retaining classifier performance over time in a non-stationary environment; (ii) understanding the shift in the relationship between covariates and output labels. To achieve both objectives, we propose a first-principles interpretable method that uses gradients of a classifier in a feature-wise hypothesis testing framework.
Instance-wise Causal Feature Selection for Model Interpretation
Causality in Vision Workshop, CVPR 2021
TL;DR: Instance-wise subset selection is a recent paradigm introduced to explain deep models, by selecting features most responsible for their output. We introduce a causal perspective to this problem and propose a method that selects features that are most causally responsible for a prediction.
Estimating boreal forest ground cover vegetation composition from nadir photographs using deep convolutional neural networks
Ecological Informatics, Elsevier 2022 [Impact factor = 7.3]
[Also selected at Tackling Climate Change Workshop, NeurIPS 2020]
TL;DR: This work investigates the use of nadir photographs taken with an iPhone 7 for estimating ground cover and biomass loading in boreal forests. Traditional visual estimation methods are prone to error and bias, so the study explores the use of deep convolutional neural networks (DCNN) for automated image segmentation. The DCNN achieved 95% accuracy in segmenting photos, providing a cost-effective and efficient alternative to manual segmentation.
Blending of Learning-based Tracking and Object Detection for Monocular Camera-based Target Following.
IFAC-PapersOnLine, Elsevier Journal, 2021 [Impact factor: 1.13]
TL;DR: This work introduces a real-time approach for monocular camera-based target following by fusing learning-based tracking and object detection. The method enhances the performance of Convolutional Recurrent Neural Network-based object trackers, particularly when dealing with familiar objects that may be occluded or affected by motion blur. The proposed approach incorporates a target re-identification module to recover tracking after losses. The system achieves a high frame rate of 85-90 FPS while demonstrating competitive results on challenging benchmarks for robotic applications.
Image Dehazing via Joint Estimation of Transmittance Map and Environmental Illumination
IEEE International Conference on Advances in Pattern Recognition (ICAPR) 2017
TL;DR: This work addresses the challenge of haze in outdoor images and presents an end-to-end system for image dehazing. The method utilizes a multi-scale Convolutional Neural Network to learn the mapping between a hazy image and its transmittance map, as well as the environmental illumination. Unlike many existing methods, the proposed approach emphasizes accurate estimation of the environmental illumination, considering its impact on the color of the dehazed image.