NeurIPS 2022 Workshop on Distribution Shifts (DistShift)
Connecting Methods and Applications
Saturday, December 3rd, 2022, New Orleans, USA
New Orleans Convention Center Room 388 - 390
This workshop brings together domain experts and ML researchers working on mitigating distribution shifts in real-world applications.
Distribution shifts—where a model is deployed on a data distribution different from what it was trained on—pose significant robustness challenges in real-world ML applications. Such shifts are often unavoidable in the wild and have been shown to substantially degrade model performance in applications such as biomedicine, wildlife conservation, sustainable development, robotics, education, and criminal justice. For example, models can systematically fail when tested on patients from different hospitals or people from different demographics.
This workshop aims to convene a diverse set of domain experts and methods-oriented researchers working on distribution shifts. We are broadly interested in methods, evaluations and benchmarks, and theory for distribution shifts, and we are especially interested in work on distribution shifts that arise naturally in real-world application contexts. Examples of relevant topics include, but are not limited to:
Examples of real-world distribution shifts in various application areas. We especially welcome applications that are not widely discussed in the ML research community, e.g., education, sustainable development, and conservation. We encourage submissions that characterize distribution shifts and their effects in real-world applications; it is not at all necessary to propose a solution that is algorithmically novel.
Methods for improving robustness to distribution shifts. Relevant settings include domain generalization, domain adaptation, and subpopulation shifts, and we are interested in a wide range of approaches, from uncertainty estimation to causal inference to active data collection. We welcome methods that can work across a variety of shifts, as well as more domain-specific methods that incorporate prior knowledge on the types of shifts we wish to be robust on. We encourage evaluating these methods on real-world distribution shifts.
Empirical and theoretical characterization of distribution shifts. Distribution shifts can vary widely in the way in which the data distribution changes, as well as the empirical trends they exhibit. What empirical trends do we observe? What empirical or theoretical frameworks can we use to characterize these different types of shifts and their effects? What kinds of theoretical settings capture useful components of real-world distribution shifts?
Benchmarks and evaluations. We especially welcome contributions for subpopulation shifts, as they are underrepresented in current ML benchmarks. We are also interested in evaluation protocols that move beyond the standard assumption of fixed training and test splits -- for which applications would we need to consider other forms of shifts, such as streams of continually-changing data or feedback loops between models and data?
See the virtual NeurIPS website for the livestream, papers, and videos.
If you have any questions, please contact us at distshift-workshop-2022@googlegroups.com.
[9:10 - 9:35] Domain Adaptation: Theory, Algorithms, and Open Library
Mingsheng Long, Tsinghua University
Mingsheng Long is an Associate Professor with tenure in the School of Software at Tsinghua University. He earned the BE and PhD degrees from Tsinghua University in 2008 and 2014 respectively, and worked as a researcher at UC Berkeley from 2014 to 2015. His research is dedicated to machine learning theory, algorithms, and applications, with special interests in transfer learning and domain adaptation, foundation models and deep learning, and informed learning with scientific knowledge. His work on transfer learning won the Test of Time Award of IJCAI FTL (2021) and has received more than 20,000 citations in Google Scholar. He serves as an Associate Editor of IEEE TPAMI and TMLR, and regularly as an Area Chair of ICML, NeurIPS, and ICLR.
[9:35 – 10:00] Machine-learning, distribution shifts and extrapolation in the Earth System
Markus Reichstein, Max Planck Institute for Biogeochemistry
Markus Reichstein is Director at the Max-Planck-Institute for Biogeochemistry, and Professor for Global Ecology at the University of Jena. He is founding co-director of the ELLIS program “Machine Learning for Earth and Climate Science” and the recently established ELLIS Unit Jena within the Michael-Stifel-Center Jena for Data-driven and Simulation Science Jena. and member of the German National Committee Future Earth for Sustainability research. He has been serving as lead author for the IPCC, as member of the German Committee Future Earth on Sustainability Research, and the Thuringian Panel on Climate for advising the state on climate protection and adaptation.
Markus’s main research interests revolve around the response and feedback of ecosystems (vegetation and soils) to climatic variability with an Earth system perspective. Of specific interest is the interplay of climate extremes with ecosystem and societal resilience. He is addressing these topics with a combination of artificial intelligence and classical modelling approaches to exploit the wealth of experimental, ground- and satellite-based Earth observations together with theoretical knowledge. Recent awards for his research include the Piers J. Sellers Mid-Career Award by the American Geophysical Union (2018), and the Gottfried Wilhelm Leibniz Preis by the German Science Foundation (2020).
He is also Principal Investigator in the European Research Council Synergy Grant USMILE dedicated to the development and application of machine learning for a better Earth system understanding and modelling. Furthermore, Markus is chairing the Global Research Program and Knowledge-Action Network “Emergent Risks and Extreme Events – Reducing Disaster Risks under Environmental Change” (www.risk-kan.org).
Markus is excited about linking system thinking with data-driven science and artificial intelligence for understanding complex systems, such as the climate-environmental-societal system and believes that such approaches can help societies become more resilient and sustainable.
[10:00 – 10:30] Coffee break
[10:30 – 10:55] The promises and pitfalls of CVAR
Pradeep Ravikumar, Carnegie Mellon University
Pradeep Ravikumar is a Professor in the Machine Learning Department, School of Computer Science at Carnegie Mellon University. He was previously an Associate Director at the Center for Big Data Analytics, at the University of Texas at Austin. His thesis has received honorable mentions in the ACM SIGKDD Dissertation award and the CMU School of Computer Science Distinguished Dissertation award. He is a Sloan Fellow, a Siebel Scholar, a recipient of the NSF CAREER Award, and was Program Chair for the International Conference on Artificial Intelligence and Statistics (AISTATS) in 2013. He is Associate Editor-in-Chief for IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), and action editor for the Machine Learning journal, and the Journal of Machine Learning Research.
Dr. Ravikumar's research group at CMU works on the foundations of statistical machine learning, with recent focus on "next generation" machine learning systems, that are explainable, robust to train and test time corruptions, resilient to distribution shifts, and are learnt under resource constraints by leveraging or discovering various notions of "structure" and domain knowledge.
[11:00 – 11:45] Panel discussion
Panelists: Behnam Neyshabur, Erin Hartman, David Sontag, Pradeep Ravikumar
Moderated by Hongseok Namkoong, Columbia University
Behnam Neyshabur, Google Research
Behnam Neyshabur is a senior staff research scientist at Google Research. Before that, he was a postdoctoral researcher at New York University and a member of Theoretical Machine Learning program at Institute for Advanced Study (IAS) in Princeton. In summer 2017, he received a PhD in computer science at TTI-Chicago. Behnam's primary interest is reasoning and algorithmic capabilities of giant language models but he has also not lost his interest in the science of deep learning.
Erin Hartman, UC Berkeley
Erin Hartman is an Assistant Professor of Political Science at the University of California, Berkeley. Her research sits at the intersection of the social sciences and statistics. Her mission is to create a body of research that bridges these two worlds — with an emphasis on answering causal questions — within which experts from both worlds can have dialogue with one another and foster beneficial collaborations. In particular, she focuses on developing methods that allow researchers to generalize experimental results beyond the units, treatments, outcomes, and contexts upon which they were conducted. She also conducts work on survey design and analysis, informed by her time running the Analytics team's polling operation for President Obama's 2012 re-election campaign, and at her start-up BlueLabs. Erin's research in survey design and analysis focuses on statistical methods for leveraging detailed individual level data to overcome the non-random nature of modern survey data.
David Sontag, MIT
David Sontag is a Professor of Electrical Engineering and Computer Science at MIT, part of the Institute for Medical Engineering & Science, the Computer Science and Artificial Intelligence Laboratory, and the J-Clinic for Machine Learning in Health. His research focuses on advancing machine learning and artificial intelligence, and using these to transform health care. Previously, he was an Assistant Professor of Computer Science and Data Science at New York University.
Pradeep Ravikumar, Carnegie Mellon University
Pradeep Ravikumar is a Professor in the Machine Learning Department, School of Computer Science at Carnegie Mellon University. He was previously an Associate Director at the Center for Big Data Analytics, at the University of Texas at Austin. His thesis has received honorable mentions in the ACM SIGKDD Dissertation award and the CMU School of Computer Science Distinguished Dissertation award. He is a Sloan Fellow, a Siebel Scholar, a recipient of the NSF CAREER Award, and was Program Chair for the International Conference on Artificial Intelligence and Statistics (AISTATS) in 2013. He is Associate Editor-in-Chief for IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), and action editor for the Machine Learning journal, and the Journal of Machine Learning Research.
Dr. Ravikumar's research group at CMU works on the foundations of statistical machine learning, with recent focus on "next generation" machine learning systems, that are explainable, robust to train and test time corruptions, resilient to distribution shifts, and are learnt under resource constraints by leveraging or discovering various notions of "structure" and domain knowledge.
[11:45 – 13:00] Lunch break
[13:00 – 14:30] Poster session
Links to the 96 accepted papers are on the virtual site and OpenReview.
For remote presenters, we will have a virtual poster session over Zoom at this link between 13:00-14:30 CST.
[14:30 - 15:30] Spotlight talks
First Steps Toward Understanding the Extrapolation of Nonlinear Models to Unseen Domains
Kefan Dong, Tengyu Ma
Learning Invariant Representations under General Interventions on the Response
Kang Du, Yu Xiang
CAREER: Economic Prediction of Labor Sequence Data Under Distribution Shift
Keyon Vafa, Emil Palikot, Tianyu Du, Ayush Kanodia, Susan Athey, David Blei
Tackling Distribution Shifts in Federated Learning with Superquantile Aggregation
Krishna Pillutla, Yassine Laguel, Jerome Malick, Zaid Harchaoui
Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient for Out-of-Distribution Generalization
Elan Rosenfeld, Pradeep Kumar Ravikumar, Andrej Risteski
Data Feedback Loops: Model-driven Amplification of Dataset Biases
Rohan Taori, Tatsunori Hashimoto
[15:30 - 15:45] Coffee break
[15:45 – 16:10] External Validity: Framework, Design, and Analysis
Erin Hartman, UC Berkeley
Erin Hartman is an Assistant Professor of Political Science at the University of California, Berkeley. Her research sits at the intersection of the social sciences and statistics. Her mission is to create a body of research that bridges these two worlds — with an emphasis on answering causal questions — within which experts from both worlds can have dialogue with one another and foster beneficial collaborations. In particular, she focuses on developing methods that allow researchers to generalize experimental results beyond the units, treatments, outcomes, and contexts upon which they were conducted. She also conducts work on survey design and analysis, informed by her time running the Analytics team's polling operation for President Obama's 2012 re-election campaign, and at her start-up BlueLabs. Erin's research in survey design and analysis focuses on statistical methods for leveraging detailed individual level data to overcome the non-random nature of modern survey data.
[16:10 – 16:35] Bringing real-world data to bear in addressing distribution shifts: a sociolinguistically-informed analysis of ASR errors
Alicia Wassink, University of Washington
Prof. Wassink is the Director of the Sociolinguistics Laboratory, and professor of Linguistics in the Department of Linguistics at the University of Washington. Her research interests lie in sociolinguistics (the study of language in its various social contexts, the relationships between language and social network structure, language attitudes and the outcomes of language and dialect contact) and phonetics (the study of the acoustic properties of spoken language, perception, and physiological aspects of human speech). One of Prof. Wassink’s principal languages of study is Jamaican Creole. When she is in Jamaica conducting fieldwork, her home base is the University of the West Indies.
[16:35 – 17:00] Geospatial Distribution Shifts in Ecology: Mapping the Urban Forest
Sara Beery, Google & MIT
Sara Beery will join MIT as an assistant professor in the Faculty of Artificial Intelligence and Decision-Making in EECS in September 2023 and is currently a visiting researcher at Google working on urban tree mapping across North America. She received her PhD in computing and mathematical sciences at Caltech in 2022, where she was advised by Pietro Perona. Her research focuses on building computer vision methods that enable global-scale environmental and biodiversity monitoring across data modalities, tackling real-world challenges including strong spatiotemporal correlations, imperfect data quality, fine-grained categories, and long-tailed distributions. She partners with nongovernmental organizations and government agencies to deploy her methods in the wild worldwide and works toward increasing the diversity and accessibility of academic research in artificial intelligence through interdisciplinary capacity building and education.
Organizers
Google Brain
Stanford University
ETH Zurich
Columbia University
RIKEN & University of Tokyo
University of Copenhagen
University of Washington & Google
Program Committee
Alexander Mangulad Christgau
A. Tuan Nguyen
Alex Xijie Lu
Alexander Robey
Alexandru Tifrea
Allan Zhou
Amartya Sanyal
Amita Kamath
Ananya Kumar
Andrew Ilyas
Anqi Liu
Austin J. Brockmeier
Byol Kim
Christopher Clark
Colin Wei
David Lopez-Paz
David Madras
Deyi Liu
Dustin Tran
Elan Rosenfeld
Elliot Creager
Eric Wallace
Erik Jones
Evgenia Rusak
Haoran Zhang
Haotian Ye
Huaxiu Yao
Ikko Yamane
Irena Gao
Ishaan Gulrajani
Jacob Clarysse
Jean Feng
Jeremiah Zhe Liu
Karthyek Murthy
Kibok Lee
Kuan-Hao Huang
Kuniaki Saito
Maksym Andriushchenko
Malte Londschien
Marvin Mengxin Zhang
Meng-Jiun Chiou
Michael Aerni
Michael Oberst
Michael Zhang
Mingsheng Long
Mitchell Wortsman
Mucong Ding
Neil Band
Nicholas R Galbraith
Nicola Gnecco
Nicolò Ruggeri
Nikola Konstantinov
Olivia Wiles
Peng Zhao
Polina Kirichenko
Prithvijit Chattopadhyay
Robert Geirhos
Robin Jia
Rohan Taori
Saeid Asgari
Samarth Mishra
Saminul Haque
Sang Michael Xie
Saurabh Garg
Shalmali Joshi
Siddharth Mysore
Steffen Schneider
Stephan Rabanser
Stephen Mussmann
Suyash Gupta
Takafumi Kanamori
Tan Minh Nguyen
Thao Nguyen
Tianle Cai
Tianyi Zhang
Tim G. J. Rudner
Tongtong Fang
Wataru Kumagai
Wieland Brendel
Xiang Lisa Li
Xiaochuang Han
Xinyang Chen
Yang Li
Yann Dubois
Yaodong Yu
Yifan Zhang
Yining Chen
Yoav Wald
Yu-Jie Zhang
Yujin Jeong
Zhongyi Pei