What if distribution shift is random?
If distribution shift is random, we should
compute confidence intervals that take into account the distributional perturbations (replication studies example )
use methods for effect generalization that are justified under random shifts (AIPW won't work, but synthetic control-style data set weighting does)
select models that are robust under random shifts (standard cross-validation would be too optimistic)
exploit random shifts when prioritizing data collection
build diagnostic tools to detect whether two data sets have random shift, deterministic shift, or a hybrid shift
Random distribution shift can lead to overlap violations and Y|X shift, so if you observe overlap violations or Y|X shift in real-world data, there's a chance that it's "just random", which would be less threatening than worst-case shifts.
Stay tuned!
Dominik Rothenhäusler
Assistant Professor of Statistics, Stanford University
David Huntington Faculty Scholar
Chamber Fellow
Hi, I'm Dominik.
My research centers around distribution shift, causal inference, and replicability. I completed my Ph.D. studies under the supervision of Nicolai Meinshausen and Peter Bühlmann at ETH Zürich in Summer 2018. Prior to joining the Stanford faculty, I completed a postdoc with Bin Yu at UC Berkeley.
I gratefully acknowledge support from the Dieter Schwarz Foundation, the Chamber Foundation, and the David Huntington Fellowship.
News:
Many existing approaches for estimating parameters in settings with distributional shifts operate under an invariance assumption. For example, under covariate shift, it is assumed that p(y|x) remains invariant. We refer to such distribution shifts as sparse, since they may be substantial but affect only a part of the data generating system. In contrast, in real-world settings, shifts might be dense. That is, they arise through numerous small and random changes in the population and environment. We discuss out-of-distribution generalization under such random, dense shifts [Preprint].
Many researchers have identified distribution shift as a likely contributor to the replication crisis. We built a set of tools to diagnose the role of observable distribution shifts in scientific replications. Surprisingly, we find little evidence that that distribution shift in observed covariates contributes appreciably to non-replicability. [Preprint, Data, R-package, Shiny app]
We developed a modular framework for statistical inference in linear models. At a high level, our method follows the routine: (i) decomposing the regression task into several sub-tasks, (ii) fitting the sub-task models, and (iii) using the sub-task models to provide an improved estimate for the original regression problem. Our paper got accepted at JMLR! [Preprint]
Statistical inference can be fragile. How stable is your statistical model under distribution shift? Our paper got accepted at NeurIPS 2023! [GitHub] [Preprint]
Do you want to conduct statistical inference for a fixed set of units, such as the current customers of a company? Targeting statistical inference to the units of interest can improve precision by more than 50%! Our manuscript got accepted at Biometrika [Link].
Is randomization inference after Mahalanobis matching justified? Our manuscript got accepted at Biometrika. [Paper]
Do you want to integrate causal evidence from different experiments, instrumental variables, and regression adjustments to get a more complete causal picture? Our manuscript got accepted at JMLR. [Link ]
Many researchers evaluate the stability of a statistical finding by running multiple analyses with differently specified models. Can we give rigorous guarantees for this practice? [GitHub] [Preprint]
I am thrilled to be awarded the David Cox Research Prize by the Royal Statistical Society! Thank you!
Recent and forthcoming talks
Talk at ICSDS 2024
Talk at JSM 2024
Talk at the "Shanghai Workshop on Robustness meets Causality" (Summer 2024)
Talk at the Warwick workshop on heterogeneous and distributed data (Summer 2024)
Talk at the Veridical Data Science Workshop at Berkeley (Spring 2024)
Talk at the University of Washington, Department of Biostatistics (Spring 2024)
Talk at the Berkeley Department of Political Science
Talk at Stanford's AFT Laboratory (Spring 2024)
Talk at the Berkeley-Stanford Joint Colloquium 2024
Talk at ACIC 2024
Talk at ICSDS 2023
Talk at CMStatistics 2023
Talk at INFORMS
Talk at JSM 2023
Talk at Copenhagen University
Talk at Nordstat 2023
Talk at the New England Statistics Symposium 2023
Talk at the Statistics Department at UC Davis
Talk at the Seminar for Statistics at ETH Zurich
Talk at the MSRI workshop on "Foundations of Stable, Generalizable and Transferable Statistical Learning"
Talk at the Stanford Statistics Seminar
Talk at the Statistics Seminar of KAUST
Talk at the Online Causal Inference Seminar
Talk at the CSLI Workshop
Talk at the Simons workshop "Statistics in the Big Data Era"
Email: rdominik {at} stanford.edu