“With Big Data Comes Big Responsibility.”
The goal of my research is to enable artificial intelligence (AI) to truly benefit society, by responsibly overcoming the emerging challenges of fairness, explainability, and privacy.
Counterfactual Explanations: Robust Recourse Under Model Multiplicity
Abstract: Counterfactual explanations inform ways to achieve a desired outcome from a machine learning model. However, such explanations are not robust to certain real-world changes in the underlying model (e.g., retraining the model, changing hyperparameters, etc.), questioning their reliability in several applications, e.g., credit lending. We note that models can change a lot in the parameter space under retraining on very similar data. We first introduce a novel metric -- that we call Counterfactual Stability -- that attempts to quantify how robust a counterfactual is going to be to model changes under retraining and comes with desirable theoretical properties. We also provide probabilistic guarantees on robustness under an abstraction of naturally-occurring model change. We also compare the performance of our strategies with popular counterfactual generation methods across benchmark datasets. The results demonstrate that our strategy generates counterfactuals that are significantly more robust (nearly 100% validity after actual model changes) and also realistic (in terms of local outlier factor) over existing state-of-the-art methods. Our work has interesting connections with model multiplicity, also known as, the Rashomon Effect.
Select Publication(s):
F. Hamman, E. Noorani, S. Mishra, D. Magazzeni, and S. Dutta, "Robust Counterfactual Explanations for Neural Networks with Probabilistic Guarantees,” International Conference on Machine Learning (ICML 2023).
S. Dutta, J Long, S Mishra, C Tilli, D Magazzeni, "Robust Counterfactual Explanations for Tree-Based Ensembles," International Conference on Machine Learning (ICML 2022). [Full Paper]
S Mishra, S Dutta, J Long, D Magazzeni, "A Survey on the Robustness of Feature Importance and Counterfactual Explanations," Explainable AI in Finance (XAI-FIN21). [Full Paper]
Fairness & Explainability
Abstract: With the growing use of AI in high-stakes applications such as hiring, lending, etc., it is of utmost importance that the decisions do not propagate biases and stereotypes. How do we identify which features are propagating biases and stereotypes? For instance, consider the hiring of a software engineer for a safety-critical application. Critical features, like coding test scores, might be weighed strongly in the decision even if biased whereas aptitude tests and reference letters may be used only to the extent that they do not discriminate. We propose a novel information-theoretic decomposition of the total disparity into a non-exempt component, which quantifies the part of the disparity that cannot be accounted for by the critical features, and an exempt component, which quantifies the remaining disparity. Our decomposition enables explainability as well as selective removal of the non-exempt component if desired. We arrive at this decomposition through examples and counterexamples that enable us to first obtain a set of desirable properties that any measure of non-exempt disparity should satisfy. We then demonstrate that our proposed quantification of non-exempt disparity satisfies all of them. This decomposition leverages a body of work from information theory called Partial Information Decomposition (PID). We also explore applications such as feature selection and debiasing filter bubbles.
Select Publication(s):
S. Sharma, S. Dutta, E. Albini, F. Lecue, D. Magazzeni and M. Veloso, "REFRESH: Responsible and Efficient Feature Reselection guided by SHAP values," AAAI/ACM Conference on AI, Ethics, and Society (AIES 2023).
S. Dutta, P. Venkatesh, P. Mardziel, A. Datta and P. Grover, "An Information-Theoretic Quantification of Discrimination with Exempt Features," AAAI Conference on Artificial Intelligence (AAAI 2020, ORAL). [Full Paper] [Poster]
S. Dutta, P. Venkatesh, P. Mardziel, A. Datta and P. Grover, "Fairness Under Feature Exemptions: Counterfactual and Observational Measures," IEEE Transactions on Information Theory 2021. [Full Paper]
C. Jiang*, B. Wu*, S. Dutta and P. Grover, "Bursting the Bubbles: Debiasing Recommendation Systems While Allowing for Chosen Category Exemptions," BIAS Workshop at ECIR 2021.
Fairness-Accuracy Tradeoffs: A Perspective from Mismatched Hypothesis Testing
Abstract: In this work, using a tool from information theory called Chernoff information, we derive fundamental limits on this relationship that explain why the accuracy on a given dataset often decreases as fairness increases. Novel to this work, we examine the problem of fair classification through the lens of a mismatched hypothesis testing problem, i.e., where we are trying to find a classifier that distinguishes between two "ideal" distributions but instead we are given two mismatched distributions that are biased. Based on this perspective, we contend that measuring accuracy with respect to the given (possibly biased) dataset is a problematic measure of performance. Instead one should also consider accuracy with respect to an ideal dataset that is unbiased. We formulate an optimization to find such ideal distributions and show that the optimization is feasible. Lastly, when the Chernoff information for one group is strictly less than another in the given dataset, we derive the information-theoretic criterion under which collection of more features can actually improve the Chernoff information and achieve fairness without compromising accuracy on the available data.
Work featured in New Scientist.
Select Publication(s):
S. Dutta, D. Wei, H. Yueksel, P. Y. Chen, S. Liu and K. R. Varshney, "Is There a Trade-Off Between Fairness and Accuracy? A Perspective Using Mismatched Hypothesis Testing," International Conference on Machine Learning (ICML 2020). [Full Paper]
Fairness & Privacy: Can Querying for Bias Leak Protected Attributes?
Abstract: Existing regulations often prohibit model developers from accessing protected attributes (gender, race, etc.) during training. This leads to scenarios where fairness assessments might need to be done on populations without knowing their memberships in protected groups. In such scenarios, institutions often adopt a separation between the model developers (who train their models with no access to the protected attributes) and a compliance team (who may have access to the entire dataset solely for auditing purposes). However, the model developers might be allowed to test their models for disparity by querying the compliance team for group fairness metrics. In this paper, we first demonstrate that simply querying for fairness metrics, such as, statistical parity and equalized odds can leak the protected attributes of individuals to the model developers. We demonstrate that there always exist strategies by which the model developers can identify the protected attribute of a targeted individual in the test dataset from just a single query. Furthermore, we show that one can reconstruct the protected attributes of all the individuals using techniques from compressed sensing. Our results pose an interesting debate in algorithmic fairness: Should querying for fairness metrics be viewed as a neutral-valued solution to ensure compliance with regulations? Or, does it constitute a violation of regulations and privacy if the number of queries answered is enough for the model developers to identify the protected attributes of specific individuals? To address this supposed violation of regulations and privacy, we also propose Attribute-Conceal, a novel technique that achieves differential privacy by calibrating noise to the smooth sensitivity of our bias query function, outperforming naive techniques such as the Laplace mechanism. We also include experimental results on the Adult dataset and synthetic dataset.
Select Publication(s):
F. Hamman, J. Chen, and S. Dutta, "Can Querying for Bias Leak Protected Attributes? Achieving Privacy With Smooth Sensitivity," ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT 2023).
Coded Computing: Fundamental Limits and Practical Strategies
Abstract: Faced with saturation of Moore's law and increasing dimension of data, system designers have increasingly resorted to parallel and distributed computing. However, distributed computing is often bottle necked by a small fraction of slow processors called "stragglers" that reduce the speed of computation because the fusion node has to wait for all processors to finish. My work proposes novel coding-theory-based solutions for speeding up distributed machine learning and large-scale computing on unreliable nodes prone to faults or stragglers.
Select Publication(s):
S. Dutta*, M. Fahim*, H. Jeong*, F. Haddadpour*, V. Cadambe and P. Grover, "On the Optimal Recovery Threshold of Coded Matrix Multiplication," IEEE Transactions on Information Theory, Jan 2020. [Full Paper]
S. Dutta, V. Cadambe and P. Grover, "Short-Dot: Computing Large Linear Transforms Distributedly using Coded Short Dot Products," Advances in Neural Information Processing Systems (NeurIPS 2016). [Conference Paper] [Full Paper]
(Extended version at IEEE Transactions on Information Theory, Oct 2019)
S. Dutta, Z. Bai, T. M. Low and P. Grover, "CodeNet: Training Large Scale Neural Networks in Presence of Soft-Errors," Coding Theory For Large-scale Machine Learning Workshop at ICML (CodML Workshop, ICML 2019, Spotlight). [Short Workshop Paper] [Full Paper]
U. Sheth, S. Dutta, M. Chaudhari, H. Jeong, Y. Yang, J. Kohonen, T. Roos, and P. Grover, "An Application of Storage-Optimal MatDot Codes for Coded Matrix Multiplication: Fast k-Nearest Neighbors Estimation,” IEEE International Conference on Big Data (IEEE BigData 2018). [Conference Paper] [Full Paper]
S. Dutta, Z. Bai, H. Jeong, T. M. Low and P. Grover, "A Unified Coded Deep Neural Network Training Strategy based on Generalized PolyDot Codes," IEEE International Symposium on Information Theory (ISIT 2018). [Full Paper]
S. Dutta, V. Cadambe and P. Grover, "Coded Convolution for parallel and distributed computing within a deadline," IEEE International Symposium on Information Theory (ISIT 2017). [Conference Paper] [Full Paper]
S. Dutta, Y. Yang, N. Wang, E. Pop, V. Cadambe and P. Grover, “Reliable Matrix Multiplication using Error-prone Dot-product Nanofunctions with an application to logistic regression” (SRC Techcon, 2016).
S. Dutta*, H. Jeong*, Y. Yang*, V. Cadambe, T. M. Low and P. Grover, “Addressing Unreliability in Emerging Devices and Non-von Neumann Architectures Using Coded Computing,” Proceedings of the IEEE, 2020. *Equal Contribution
P. Grover, H. Jeong, Y. Yang, S. Dutta, Z. Bal, T. M. Low, M. Fahim, F. Haddadpour, V. Cadambe, "Coded computation strategies for distributed matrix-matrix and matrix-vector products,” Patent Application 16588990. (US Patent)
Slow and Stale Gradients can win the Race: Error-Runtime Tradeoffs in Distributed SGD
Abstract: Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner, suffers from delays in waiting for the slowest learners (stragglers). Asynchronous methods can alleviate stragglers, but cause gradient staleness that can adversely affect convergence. In this work, we present a novel theoretical characterization of the speed-up offered by asynchronous methods by analyzing the trade-off between the error in the trained model and the actual training runtime (wallclock time). The novelty in our work is that our runtime analysis considers random straggler delays, which helps us design and compare distributed SGD algorithms that strike a balance between stragglers and staleness. We also present a new convergence analysis of asynchronous SGD variants without bounded or exponential delay assumptions.
Select Publication(s):
S. Dutta, G. Joshi, P. Dube, S. Ghosh and P. Nagpurkar, "Slow and stale gradients can win the race: Error-Runtime trade-offs in Distributed SGD," International Conference on Artificial Intelligence and Statistics (AISTATS 2018). [Conference Paper] [Full Paper]
S. Dutta, J. Wang and G. Joshi, "Slow and stale gradients can win the race," IEEE Journal on Selected Areas in Information Theory (JSAIT 2021).
P. Dube, S. Dutta, G. Joshi and P. Nagpurkar, "Adaptive Learning Rate Schedule In Distributed Stochastic Gradient Descent," Patent Application: 20190303787. (US Patent) [Online]
Event Extraction for Natural Language Processing Using Graph Neural Networks
Select Publication(s):
S. Dutta, L. Ma, T. K. Saha, D. Liu, J. Tetreault, and A. Jaimes, "GTN-ED: Event Detection Using Graph Transformer Networks," TextGraphs Workshop at NAACL 2021.
Information Flow in Computational Systems
Select Publication(s):
P. Venkatesh, S. Dutta and P. Grover, "How else should we define Information Flow in Neural Circuits,” IEEE International Symposium on Information Theory (ISIT 2020).
P. Venkatesh, S. Dutta and P. Grover, "How should we define Information Flow in Neural Circuits,” IEEE International Symposium on Information Theory (ISIT 2019). [Conference Paper] [Full Paper]
P. Venkatesh, S. Dutta and P. Grover, "Information Flow in Computational Systems,” IEEE Transactions on Information Theory, April 2020. [Full Paper]
Compressed Sensing & Sparse Reconstruction
Select Publication(s):
S. Dutta and P. Grover, "Adaptivity provably helps: Information-theoretic limits on l0 cost of non-adaptive sensing," IEEE International Symposium on Information Theory (ISIT 2016). [Conference Paper] [Full Paper]
S. Dutta and A. De, "Sparse UltraWideBand Radar Imaging in a Locally Adapting Matching Pursuit (LAMP) Framework," IEEE International Radar Conference, 2015 (RADAR 2015). [Conference Paper] [Full Paper]
(Received Best Undergraduate Thesis Award for this work)