Robustness & Influence Functions Workshop
Date: 2024 July 31, Location: Stata Center, MIT
Date: 2024 July 31, Location: Stata Center, MIT
Influence functions, a classic idea from robust statistics, estimate the effect of infinitesimally upweighting a training instance on the model’s parameters. While traditionally used as a theoretical tool for establishing favorable properties of statistical estimators, modern automatic differentiation tools have transformed them into practical tools for data analysis by allowing practitioners to easily and automatically compute them. As a result, they are increasingly being studied, adapted, and built upon by computer scientists, statisticians, econometricians, and others. In this workshop, we explore modern uses and analyses of influence functions for the purposes of robustness quantification, as well as connections to broader themes in robust statistics and machine learning.
The workshop will take place on Wednesday 2024 July 31 at the Stata Center at MIT in the Patil/Kiva Seminar Room (32-G449).
Link to register: https://forms.gle/brUiCT9p9z53E9Pj7 (Please register by 2024 July 23 if at all possible to help us plan coffee and food. If you’re late, though, please register anyway!)
Questions? Contact: samhop@mit.edu
Streaming: We will stream talks via Zoom, with a link posted on this website the morning of the workshop. We strongly encourage workshop participants who can do so to come in person.
ZOOM LINK: https://mit.zoom.us/j/97473726708
UPDATE: 10:25am welcoming remarks
10am: Logan Engstrom -- Exact influence functions (and more) at scale
10:30am: Allen Liu -- Robustness Against Overfitting to Modeling Assumptions Through the Lens of Semi-Random Models
11am: coffee break
11:30am: Jenny Huang -- Gradient-based approximations to data dropping: unmasking failure modes
12noon: Omri Lev -- Faster Machine unlearning via Natural Gradient Descent
12:30pm: lunch break
2:30pm: Ryan Giordano -- Approximate data deletion and replication with the Bayesian influence function
3:00pm: Dhruv Rohatgi -- Can We Provably Audit the Stability of Ordinary Least Squares?
3:30pm: coffee break
4:00pm: Ittai Rubinstein -- Practical Robustness Auditing for Ordinary Least Squares: To Singularity and Beyond
4:30pm: Vinith Suriyakumar -- Algorithms that Approximate Data Removal: New Results and Limitations
5:00pm: informal chatting
Logan Engstrom -- Exact influence functions (and more) at scale
Calculating the influence function (i.e., the effect of infinitesimally up-weighting a training sample) is difficult in large-scale, non-convex ML settings. Existing methods instead *approximate* influences (e.g., via the infinitesimal jackknife) that are often not predictive in practice. In this work, we introduce a method for calculating influences that (almost) perfectly predict how different data weightings change large-scale model behavior. We first introduce an efficient algorithm that *exactly* calculate large-scale model influences. Applying it naïvely, we find that influences do not predict model behavior. We then go back to (optimization) first principles and identify both the problem---standard large-scale training is too chaotic---and a solution: “smooth” training. After calculating (predictive) influences for smooth large-scale models, we apply our framework to two other related (and previously intractable) tasks.
Allen Liu -- Robustness Against Overfitting to Modeling Assumptions Through the Lens of Semi-Random Models
We study two of the most fundamental linear inverse problems -- sparse regression and matrix completion. While these problems are hard in the worst case, standard formulations assume that the data is drawn from a well-behaved generative model. In these cases, both problems are solvable efficiently. For both problems, there are two main classes of approaches: general convex programming techniques and more practical, fast (nearly-linear time) iterative methods. While these fast algorithms provably work under the generative model, they are often brittle in various real-world settings. We investigate this phenomena through the lens of semi-random models, where an adversary may add arbitrary "helpful" data to the data drawn from the generative model. Existing fast algorithms break in this semi-random setting, suggesting that the fast algorithms may overfit to the assumptions on the generative model. For both sparse regression and matrix completion, we develop fast iterative algorithms that are robust in the semi-random setting. Rather than trying to preprocess the data by filtering or preconditioning, our algorithms are based on a new framework for computing local reweightings that guarantee progress in each step of the iterative method. We hope that our approach opens the door to new robust, efficient algorithms for natural statistical inverse problems.
Jenny Huang -- Gradient-based approximations to data dropping: unmasking failure modes
A data analyst might worry about generalization if dropping a very small fraction of data points from a study could change its substantive conclusions. Finding the worst-case data subset to drop poses a combinatorial optimization problem. To overcome this intractability, Broderick et al. [2023] pose a gradient-based approximation that takes derivatives in an instantiated weight space. Building on work where the dropped data subset is known in advance, we propose an alternative approximation using derivatives in parameter space. Both approximations treat the impact of a collection of data points as the sum of their individual contributions. We focus on a subset of linear regression analyses, where our proposed approximation reduces to that of Kuschnig et al. [2021]. We identify failure modes of both approximations. Many of our examples reflect "masking", where data points may meaningfully affect a conclusion jointly but not individually. Finally, we suggest directions for future improvements.
Omri Lev -- Faster Machine unlearning via Natural Gradient Descent
In this talk, we tackle the challenge of efficiently and reliably deleting data from machine learning models trained via Empirical Risk Minimization (ERM), a process known as machine unlearning. Instead of retraining models from scratch, we introduce a novel algorithm that leverages Natural Gradient Descent (NGD). Our theoretical framework provides strong privacy guarantees for convex models, while we develop a practical Min/Max optimization algorithm for non-convex models. Comprehensive evaluations demonstrate significant enhancements in privacy, computational efficiency, and generalization over state-of-the-art methods, pushing forward both the theoretical and practical dimensions of machine unlearning. Additionally, we explore the connections between our proposed framework and influence functions, presenting computationally efficient methods to calculate influence functions in various scenarios.
Ryan Giordano -- Approximate data deletion and replication with the Bayesian influence function
Many model-agnostic statistical diagnostics are based on repeatedly re-fitting a model with some observations deleted or replicated. Cross-validation, the non-parametric bootstrap, and outlier detection via case deletion are examples of this technique. However, for Bayesian statistical procedures based on Markov Chain Monte Carlo (MCMC), re-computing posteriors for many slightly different datasets can be computationally prohibitive. Instead of exactly re-fitting, one might use the entire dataset and a single MCMC run to form a linear approximation to the effect of re-weighting observations. In the robust statistics literature, the leading term of this linear approximation is known as the influence function. We show that, for Bayesian posteriors, the influence function takes the form of a set of easily-estimated posterior covariances, and that the error of the linear approximation vanishes asymptotically for finite-dimensional posteriors under standard regularity conditions. However, in models for which the number of parameters grows with the size of the data, $N$, we show that the error of the linear approximation based on the influence function does not vanish, even for finite-dimensional subsets of the parameters whose posterior does concentrate at a $\sqrt{N}$ rate. We discuss the implications for infinitesimal jackknife covariances, the bootstrap, and approximate cross-validation, as well what is implicitly meant by ``exchangeability'' when using the influence function in this way.
Dhruv Rohatgi -- Can We Provably Audit the Stability of Ordinary Least Squares?
Measuring the stability of conclusions derived from Ordinary Least Squares linear regression is critically important, but most metrics either only measure local stability (i.e. against infinitesimal changes in the data), or are only interpretable under statistical assumptions. Recent work proposed a simple, global, finite-sample stability metric: the minimum number of samples that need to be removed so that rerunning the analysis overturns the conclusion, specifically meaning that the sign of a particular coefficient of the estimated regressor changes. However, besides exhaustive search over all subsets of data, the only prior approach for computing this metric was a greedy heuristic that lacks provable guarantees under reasonable, verifiable assumptions; the heuristic provides a loose upper bound on the stability and also cannot certify lower bounds on it.
In this talk, we pose the algorithmic task of auditing regressions via certifying lower bounds on stability. We show that in the low-dimensional regime where the number of covariates is a constant (but the number of samples is large), there are efficient and provably correct algorithms for estimating (a fractional version of) stability. Moreover, we give a computational lower bound in high dimensions. We empirically demonstrate the utility of these algorithms, and discuss some directions for future research. Based on joint work with Ankur Moitra: "Provably Auditing Ordinary Least Squares in Low Dimensions", ICLR 2023.
Ittai Rubinstein -- Practical Robustness Auditing for Ordinary Least Squares: To Singularity and Beyond
It has recently been discovered that the conclusions of many highly influential econometrics studies can be overturned by removing a very small fraction of their samples (often less than 0.5%). These conclusions are typically based on the results of one or more Ordinary Least Squares (OLS) regressions, raising the question: given a dataset, can we certify the robustness of an OLS fit on this dataset to the removal of a given number of samples?
Brute-force techniques quickly break down even on small datasets. Existing approaches which go beyond brute force either can only find candidate small subsets to remove (but cannot certify their non-existence) [BGM20, KZC21] are computationally intractable beyond low dimensional settings [MR22], or require very strong hypercontractivity assumptions on the data distribution and too many samples to give reasonable bounds in practice [BP21, FH23].
We present an efficient algorithm for certifying the robustness of linear regressions to removals of samples. We implement our algorithm and run it on several landmark econometrics datasets with hundreds of dimensions and tens of thousands of samples, giving the first non-trivial certificates of robustness to sample removal for datasets of dimension 4 or greater. We prove that under distributional assumptions on a dataset, the bounds produced by our algorithm are tight up to a 1 + o(1) multiplicative factor.
Vinith Suriyakumar -- Algorithms that Approximate Data Removal: New Results and Limitations
In this talk, we present the problem of deleting user data from machine learning models trained using empirical risk minimization. Leveraging the infintesimal jacknife, we develop an online unlearning algorithm that is both computationally and memory efficient. Unlike prior memory efficient unlearning algorithms, we target models that minimize objectives with non-smooth regularizers. We also provide generalization, deletion capacity, and unlearning guarantees that are consistent with state of the art methods. Across a variety of benchmark datasets, our algorithm empirically improves upon the runtime of prior methods while maintaining the same memory requirements and test accuracy. Finally, we open a new direction of inquiry by proving that all approximate unlearning algorithms introduced so far fail to unlearn in problem settings where common hyperparameter tuning methods, such as cross-validation, have been used to select models. This is joint work with Ashia Wilson, accepted at NeurIPS 2022.