Abstract: Our goal is to develop novel strategies for data selection, drawing inspiration from mathematical-principled approaches in interpretability and robustness. Specific use-cases include instruction-tuning for LLMs, quantifying prediction consistency in Tabular LLMs, and mitigating spurious correlations.
Select Publication(s):
Y. Fu, F. Hamman, S. Dutta, "T-Shirt: Token-Selective Hierarchical Data Selection for Instruction Tuning" [Full-Paper]
D. Egea, B. Halder, and S. Dutta, “VISION: Robust and Interpretable Code Vulnerability Detection Leveraging Counterfactual Augmentation” [Full-Paper]
B. Halder, F. Hamman, P. Dissanayake, Q. Zhang, I. Sucholutsky, and S. Dutta, "Quantifying Spuriousness of Biased Datasets Using Partial Information Decomposition," ICML Workshop on Data-centric Machine Learning Research: Datasets for Foundation Models (ICML-DMLR Workshop 2024).
F. Hamman, P. Dissanayake, S. Mishra, F. Lecue, S. Dutta, "Quantifying Prediction Consistency Under Fine-tuning Multiplicity in Tabular LLMs" International Conference on Machine Learning (ICML 2025).
Abstract: With the growing scale and resource consumption of complex machine learning models, it is of utmost interest to develop simpler models that can be efficiently deployed in resource-constrained environments and edge devices. This project aims to develop novel strategies for reconstructing simpler, interpretable, and efficient models from larger complex models by specifically integrating explainability in this context. Our research uses tools from polytope theory, optimization, statistics, and information theory to advance the foundations of trust and transparency, paving the way for democratized access of machine learning models.
Select Publication(s):
P. Dissanayake, F. Hamman, B. Halder, Q. Zhang, I. Sucholutsky, S. Dutta, "Quantifying Knowledge Distillation using Partial Information Decomposition" Artificial Intelligence and Statistics (AISTATS 2025).
P. Dissanayake and S. Dutta, "Model Reconstruction Using Counterfactual Explanations: A Perspective From Polytope Theory," Advances in Neural Information Processing Systems (NeurIPS 2024).
Abstract: Counterfactual explanations inform ways to achieve a desired outcome from a machine learning model. However, such explanations are not robust to certain real-world changes in the underlying model (e.g., retraining the model, changing hyperparameters, etc.), questioning their reliability in several applications, e.g., credit lending. We note that models can change a lot in the parameter space under retraining on very similar data. We first introduce a novel metric -- that we call Counterfactual Stability -- that attempts to quantify how robust a counterfactual is going to be to model changes under retraining and comes with desirable theoretical properties. We also provide probabilistic guarantees on robustness under an abstraction of naturally-occurring model change. We also compare the performance of our strategies with popular counterfactual generation methods across benchmark datasets. The results demonstrate that our strategy generates counterfactuals that are significantly more robust (nearly 100% validity after actual model changes) and also realistic (in terms of local outlier factor) over existing state-of-the-art methods. Our work has interesting connections with model multiplicity, also known as, the Rashomon Effect.
Select Publication(s):
E. Noorani, P. Dissanayake, F. Hamman, S. Dutta, "Counterfactual Explanations for Model Ensembles Using Entropic Risk Measures," International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2025).
F. Hamman, E. Noorani, S. Mishra, D. Magazzeni, and S. Dutta, "Robust Algorithmic Recourse Under Model Multiplicity with Probabilistic Guarantees,” Journal on Selected Areas in Information Theory: Information-Theoretic Methods for Trustworthy Machine Learning (JSAIT 2024).
F. Hamman, E. Noorani, S. Mishra, D. Magazzeni, and S. Dutta, "Robust Counterfactual Explanations for Neural Networks with Probabilistic Guarantees,” International Conference on Machine Learning (ICML 2023).
S. Dutta, J Long, S Mishra, C Tilli, D Magazzeni, "Robust Counterfactual Explanations for Tree-Based Ensembles," International Conference on Machine Learning (ICML 2022). [Full Paper]
S Mishra, S Dutta, J Long, D Magazzeni, "A Survey on the Robustness of Feature Importance and Counterfactual Explanations," Explainable AI in Finance (XAI-FIN21). [Full Paper]
Abstract: Machine learning is becoming increasingly prevalent in various aspects of our lives, including several high-stakes applications, such as finance, education, and employment. These machine learning models have shown remarkable success at learning patterns present in the historical data. However, indiscriminate learning of all patterns can sometimes lead to unintended consequences, such as perpetuating disparities based on gender, race, and other protected attributes, that can adversely affect certain groups of people. This project seeks to advance the foundations of ethical and socially-responsible machine learning by empowering users to systematically identify, explain, and mitigate the sources of disparity. Rethinking the traditional paradigm of separately addressing fairness and explainability, this research project will jointly examine fairness and explainability through a unified information-theoretic lens. In particular, we use a body of work in information theory called Partial Information Decomposition.
The research project provides a novel information-theoretic view of responsible machine learning, by leveraging a body of work in information theory called Partial Information Decomposition (PID). PID is closely tethered to the principles of Blackwell sufficiency in statistical decision theory and provides a formal way of quantifying when a random variable is "more informative" than another with respect to a target variable. Combined with estimation and optimization techniques, this project enables us to disentangle the joint information content that several random variables share about another target variable, e.g., protected attributes such as gender, race, age, nationality, etc.
Select Publication(s):
F. Hamman and S. Dutta, "Demystifying Local & Global Fairness Trade-offs in Federated Learning Using Partial Information Decomposition,” International Conference on Learning Representations (ICLR 2024).
F. Hamman and S. Dutta, "A Unified View of Group Fairness Tradeoffs Using Partial Information Decomposition," IEEE International Symposium on Information Theory (ISIT 2024).
S. Sharma, S. Dutta, E. Albini, F. Lecue, D. Magazzeni and M. Veloso, "REFRESH: Responsible and Efficient Feature Reselection guided by SHAP values," AAAI/ACM Conference on AI, Ethics, and Society (AIES 2023).
S. Dutta, F. Hamman, "A Review of Partial Information Decomposition in Algorithmic Fairness and Explainability," Entropy 2023.
S. Dutta, P. Venkatesh, P. Mardziel, A. Datta and P. Grover, "An Information-Theoretic Quantification of Discrimination with Exempt Features," AAAI Conference on Artificial Intelligence (AAAI 2020, ORAL). [Full Paper] [Poster]
S. Dutta, P. Venkatesh, P. Mardziel, A. Datta and P. Grover, "Fairness Under Feature Exemptions: Counterfactual and Observational Measures," IEEE Transactions on Information Theory 2021. [Full Paper]
C. Jiang*, B. Wu*, S. Dutta and P. Grover, "Bursting the Bubbles: Debiasing Recommendation Systems While Allowing for Chosen Category Exemptions," BIAS Workshop at ECIR 2021.
Abstract: In this work, using a tool from information theory called Chernoff information, we derive fundamental limits on this relationship that explain why the accuracy on a given dataset often decreases as fairness increases. Novel to this work, we examine the problem of fair classification through the lens of a mismatched hypothesis testing problem, i.e., where we are trying to find a classifier that distinguishes between two "ideal" distributions but instead we are given two mismatched distributions that are biased. Based on this perspective, we contend that measuring accuracy with respect to the given (possibly biased) dataset is a problematic measure of performance. Instead one should also consider accuracy with respect to an ideal dataset that is unbiased. We formulate an optimization to find such ideal distributions and show that the optimization is feasible. Lastly, when the Chernoff information for one group is strictly less than another in the given dataset, we derive the information-theoretic criterion under which collection of more features can actually improve the Chernoff information and achieve fairness without compromising accuracy on the available data.
Work featured in New Scientist.
Select Publication(s):
S. Dutta, D. Wei, H. Yueksel, P. Y. Chen, S. Liu and K. R. Varshney, "Is There a Trade-Off Between Fairness and Accuracy? A Perspective Using Mismatched Hypothesis Testing," International Conference on Machine Learning (ICML 2020). [Full Paper]
Abstract: Existing regulations often prohibit model developers from accessing protected attributes (gender, race, etc.) during training. This leads to scenarios where fairness assessments might need to be done on populations without knowing their memberships in protected groups. In such scenarios, institutions often adopt a separation between the model developers (who train their models with no access to the protected attributes) and a compliance team (who may have access to the entire dataset solely for auditing purposes). However, the model developers might be allowed to test their models for disparity by querying the compliance team for group fairness metrics. In this paper, we first demonstrate that simply querying for fairness metrics, such as, statistical parity and equalized odds can leak the protected attributes of individuals to the model developers. We demonstrate that there always exist strategies by which the model developers can identify the protected attribute of a targeted individual in the test dataset from just a single query. Furthermore, we show that one can reconstruct the protected attributes of all the individuals using techniques from compressed sensing. Our results pose an interesting debate in algorithmic fairness: Should querying for fairness metrics be viewed as a neutral-valued solution to ensure compliance with regulations? Or, does it constitute a violation of regulations and privacy if the number of queries answered is enough for the model developers to identify the protected attributes of specific individuals? To address this supposed violation of regulations and privacy, we also propose Attribute-Conceal, a novel technique that achieves differential privacy by calibrating noise to the smooth sensitivity of our bias query function, outperforming naive techniques such as the Laplace mechanism. We also include experimental results on the Adult dataset and synthetic dataset.
Select Publication(s):
F. Hamman, J. Chen, and S. Dutta, "Can Querying for Bias Leak Protected Attributes? Achieving Privacy With Smooth Sensitivity," ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT 2023).
Abstract: Faced with saturation of Moore's law and increasing dimension of data, system designers have increasingly resorted to parallel and distributed computing. However, distributed computing is often bottle necked by a small fraction of slow processors called "stragglers" that reduce the speed of computation because the fusion node has to wait for all processors to finish. My work proposes novel coding-theory-based solutions for speeding up distributed machine learning and large-scale computing on unreliable nodes prone to faults or stragglers.
Select Publication(s):
S. Dutta*, M. Fahim*, H. Jeong*, F. Haddadpour*, V. Cadambe and P. Grover, "On the Optimal Recovery Threshold of Coded Matrix Multiplication," IEEE Transactions on Information Theory, Jan 2020. [Full Paper]
S. Dutta, V. Cadambe and P. Grover, "Short-Dot: Computing Large Linear Transforms Distributedly using Coded Short Dot Products," Advances in Neural Information Processing Systems (NeurIPS 2016). [Conference Paper] [Full Paper]
(Extended version at IEEE Transactions on Information Theory, Oct 2019)
S. Dutta, Z. Bai, T. M. Low and P. Grover, "CodeNet: Training Large Scale Neural Networks in Presence of Soft-Errors," Coding Theory For Large-scale Machine Learning Workshop at ICML (CodML Workshop, ICML 2019, Spotlight). [Short Workshop Paper] [Full Paper]
U. Sheth, S. Dutta, M. Chaudhari, H. Jeong, Y. Yang, J. Kohonen, T. Roos, and P. Grover, "An Application of Storage-Optimal MatDot Codes for Coded Matrix Multiplication: Fast k-Nearest Neighbors Estimation,” IEEE International Conference on Big Data (IEEE BigData 2018). [Conference Paper] [Full Paper]
S. Dutta, Z. Bai, H. Jeong, T. M. Low and P. Grover, "A Unified Coded Deep Neural Network Training Strategy based on Generalized PolyDot Codes," IEEE International Symposium on Information Theory (ISIT 2018). [Full Paper]
S. Dutta, V. Cadambe and P. Grover, "Coded Convolution for parallel and distributed computing within a deadline," IEEE International Symposium on Information Theory (ISIT 2017). [Conference Paper] [Full Paper]
S. Dutta, Y. Yang, N. Wang, E. Pop, V. Cadambe and P. Grover, “Reliable Matrix Multiplication using Error-prone Dot-product Nanofunctions with an application to logistic regression” (SRC Techcon, 2016).
S. Dutta*, H. Jeong*, Y. Yang*, V. Cadambe, T. M. Low and P. Grover, “Addressing Unreliability in Emerging Devices and Non-von Neumann Architectures Using Coded Computing,” Proceedings of the IEEE, 2020. *Equal Contribution
P. Grover, H. Jeong, Y. Yang, S. Dutta, Z. Bal, T. M. Low, M. Fahim, F. Haddadpour, V. Cadambe, "Coded computation strategies for distributed matrix-matrix and matrix-vector products,” Patent Application 16588990. (US Patent)
Abstract: Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner, suffers from delays in waiting for the slowest learners (stragglers). Asynchronous methods can alleviate stragglers, but cause gradient staleness that can adversely affect convergence. In this work, we present a novel theoretical characterization of the speed-up offered by asynchronous methods by analyzing the trade-off between the error in the trained model and the actual training runtime (wallclock time). The novelty in our work is that our runtime analysis considers random straggler delays, which helps us design and compare distributed SGD algorithms that strike a balance between stragglers and staleness. We also present a new convergence analysis of asynchronous SGD variants without bounded or exponential delay assumptions.
Select Publication(s):
S. Dutta, G. Joshi, P. Dube, S. Ghosh and P. Nagpurkar, "Slow and stale gradients can win the race: Error-Runtime trade-offs in Distributed SGD," International Conference on Artificial Intelligence and Statistics (AISTATS 2018). [Conference Paper] [Full Paper]
S. Dutta, J. Wang and G. Joshi, "Slow and stale gradients can win the race," IEEE Journal on Selected Areas in Information Theory (JSAIT 2021).
P. Dube, S. Dutta, G. Joshi and P. Nagpurkar, "Adaptive Learning Rate Schedule In Distributed Stochastic Gradient Descent," Patent Application: 20190303787. (US Patent) [Online]
Select Publication(s):
S. Dutta, L. Ma, T. K. Saha, D. Liu, J. Tetreault, and A. Jaimes, "GTN-ED: Event Detection Using Graph Transformer Networks," TextGraphs Workshop at NAACL 2021.
Select Publication(s):
P Venkatesh, S Dutta*, N Mehta*, P Grover, "Can Information Flows Suggest Targets for Interventions in Neural Circuits?," Neural Information Processing Systems (NeurIPS 2021). [Full Paper]
P. Venkatesh, S. Dutta and P. Grover, "How else should we define Information Flow in Neural Circuits,” IEEE International Symposium on Information Theory (ISIT 2020).
P. Venkatesh, S. Dutta and P. Grover, "How should we define Information Flow in Neural Circuits,” IEEE International Symposium on Information Theory (ISIT 2019). [Conference Paper] [Full Paper]
P. Venkatesh, S. Dutta and P. Grover, "Information Flow in Computational Systems,” IEEE Transactions on Information Theory, April 2020. [Full Paper]
Select Publication(s):
S. Dutta and P. Grover, "Adaptivity provably helps: Information-theoretic limits on l0 cost of non-adaptive sensing," IEEE International Symposium on Information Theory (ISIT 2016). [Conference Paper] [Full Paper]
S. Dutta and A. De, "Sparse UltraWideBand Radar Imaging in a Locally Adapting Matching Pursuit (LAMP) Framework," IEEE International Radar Conference, 2015 (RADAR 2015). [Conference Paper] [Full Paper]
(Received Best Undergraduate Thesis Award for this work)