NeurIPS 2022 Workshop on Distribution Shifts (DistShift)

Connecting Methods and Applications

Saturday, December 3rd, 2022, New Orleans, USA

New Orleans Convention Center Room 388 - 390


NeurIPS Virtual Website | Accepted Papers

This workshop brings together domain experts and ML researchers working on mitigating distribution shifts in real-world applications.

Distribution shiftswhere a model is deployed on a data distribution different from what it was trained onpose significant robustness challenges in real-world ML applications. Such shifts are often unavoidable in the wild and have been shown to substantially degrade model performance in applications such as biomedicine, wildlife conservation, sustainable development, robotics, education, and criminal justice. For example, models can systematically fail when tested on patients from different hospitals or people from different demographics.

This workshop aims to convene a diverse set of domain experts and methods-oriented researchers working on distribution shifts. We are broadly interested in methods, evaluations and benchmarks, and theory for distribution shifts, and we are especially interested in work on distribution shifts that arise naturally in real-world application contexts. Examples of relevant topics include, but are not limited to:

  • Examples of real-world distribution shifts in various application areas. We especially welcome applications that are not widely discussed in the ML research community, e.g., education, sustainable development, and conservation. We encourage submissions that characterize distribution shifts and their effects in real-world applications; it is not at all necessary to propose a solution that is algorithmically novel.

  • Methods for improving robustness to distribution shifts. Relevant settings include domain generalization, domain adaptation, and subpopulation shifts, and we are interested in a wide range of approaches, from uncertainty estimation to causal inference to active data collection. We welcome methods that can work across a variety of shifts, as well as more domain-specific methods that incorporate prior knowledge on the types of shifts we wish to be robust on. We encourage evaluating these methods on real-world distribution shifts.

  • Empirical and theoretical characterization of distribution shifts. Distribution shifts can vary widely in the way in which the data distribution changes, as well as the empirical trends they exhibit. What empirical trends do we observe? What empirical or theoretical frameworks can we use to characterize these different types of shifts and their effects? What kinds of theoretical settings capture useful components of real-world distribution shifts?

  • Benchmarks and evaluations. We especially welcome contributions for subpopulation shifts, as they are underrepresented in current ML benchmarks. We are also interested in evaluation protocols that move beyond the standard assumption of fixed training and test splits -- for which applications would we need to consider other forms of shifts, such as streams of continually-changing data or feedback loops between models and data?


See the virtual NeurIPS website for the livestream, papers, and videos.
If you have any questions, please contact us at distshift-workshop-2022@googlegroups.com.

[9:10 - 9:35] Domain Adaptation: Theory, Algorithms, and Open Library

Mingsheng Long, Tsinghua University

Mingsheng Long is an Associate Professor with tenure in the School of Software at Tsinghua University. He earned the BE and PhD degrees from Tsinghua University in 2008 and 2014 respectively, and worked as a researcher at UC Berkeley from 2014 to 2015. His research is dedicated to machine learning theory, algorithms, and applications, with special interests in transfer learning and domain adaptation, foundation models and deep learning, and informed learning with scientific knowledge. His work on transfer learning won the Test of Time Award of IJCAI FTL (2021) and has received more than 20,000 citations in Google Scholar. He serves as an Associate Editor of IEEE TPAMI and TMLR, and regularly as an Area Chair of ICML, NeurIPS, and ICLR.

[9:35 – 10:00] Machine-learning, distribution shifts and extrapolation in the Earth System

Markus Reichstein, Max Planck Institute for Biogeochemistry

Markus Reichstein is Director at the Max-Planck-Institute for Biogeochemistry, and Professor for Global Ecology at the University of Jena. He is founding co-director of the ELLIS program “Machine Learning for Earth and Climate Science” and the recently established ELLIS Unit Jena within the Michael-Stifel-Center Jena for Data-driven and Simulation Science Jena. and member of the German National Committee Future Earth for Sustainability research. He has been serving as lead author for the IPCC, as member of the German Committee Future Earth on Sustainability Research, and the Thuringian Panel on Climate for advising the state on climate protection and adaptation.

Markus’s main research interests revolve around the response and feedback of ecosystems (vegetation and soils) to climatic variability with an Earth system perspective. Of specific interest is the interplay of climate extremes with ecosystem and societal resilience. He is addressing these topics with a combination of artificial intelligence and classical modelling approaches to exploit the wealth of experimental, ground- and satellite-based Earth observations together with theoretical knowledge. Recent awards for his research include the Piers J. Sellers Mid-Career Award by the American Geophysical Union (2018), and the Gottfried Wilhelm Leibniz Preis by the German Science Foundation (2020).

He is also Principal Investigator in the European Research Council Synergy Grant USMILE dedicated to the development and application of machine learning for a better Earth system understanding and modelling. Furthermore, Markus is chairing the Global Research Program and Knowledge-Action Network “Emergent Risks and Extreme Events – Reducing Disaster Risks under Environmental Change” (www.risk-kan.org).

Markus is excited about linking system thinking with data-driven science and artificial intelligence for understanding complex systems, such as the climate-environmental-societal system and believes that such approaches can help societies become more resilient and sustainable.

[10:00 – 10:30] Coffee break

[10:30 – 10:55] The promises and pitfalls of CVAR

Pradeep Ravikumar, Carnegie Mellon University

Pradeep Ravikumar is a Professor in the Machine Learning Department, School of Computer Science at Carnegie Mellon University. He was previously an Associate Director at the Center for Big Data Analytics, at the University of Texas at Austin. His thesis has received honorable mentions in the ACM SIGKDD Dissertation award and the CMU School of Computer Science Distinguished Dissertation award. He is a Sloan Fellow, a Siebel Scholar, a recipient of the NSF CAREER Award, and was Program Chair for the International Conference on Artificial Intelligence and Statistics (AISTATS) in 2013. He is Associate Editor-in-Chief for IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), and action editor for the Machine Learning journal, and the Journal of Machine Learning Research.

Dr. Ravikumar's research group at CMU works on the foundations of statistical machine learning, with recent focus on "next generation" machine learning systems, that are explainable, robust to train and test time corruptions, resilient to distribution shifts, and are learnt under resource constraints by leveraging or discovering various notions of "structure" and domain knowledge.

[11:00 – 11:45] Panel discussion

Panelists: Behnam Neyshabur, Erin Hartman, David Sontag, Pradeep Ravikumar

Moderated by Hongseok Namkoong, Columbia University

Behnam Neyshabur, Google Research

Behnam Neyshabur is a senior staff research scientist at Google Research. Before that, he was a postdoctoral researcher at New York University and a member of Theoretical Machine Learning program at Institute for Advanced Study (IAS) in Princeton. In summer 2017, he received a PhD in computer science at TTI-Chicago. Behnam's primary interest is reasoning and algorithmic capabilities of giant language models but he has also not lost his interest in the science of deep learning.

Erin Hartman, UC Berkeley

Erin Hartman is an Assistant Professor of Political Science at the University of California, Berkeley. Her research sits at the intersection of the social sciences and statistics. Her mission is to create a body of research that bridges these two worlds — with an emphasis on answering causal questions — within which experts from both worlds can have dialogue with one another and foster beneficial collaborations. In particular, she focuses on developing methods that allow researchers to generalize experimental results beyond the units, treatments, outcomes, and contexts upon which they were conducted. She also conducts work on survey design and analysis, informed by her time running the Analytics team's polling operation for President Obama's 2012 re-election campaign, and at her start-up BlueLabs. Erin's research in survey design and analysis focuses on statistical methods for leveraging detailed individual level data to overcome the non-random nature of modern survey data.

David Sontag, MIT

David Sontag is a Professor of Electrical Engineering and Computer Science at MIT, part of the Institute for Medical Engineering & Science, the Computer Science and Artificial Intelligence Laboratory, and the J-Clinic for Machine Learning in Health. His research focuses on advancing machine learning and artificial intelligence, and using these to transform health care. Previously, he was an Assistant Professor of Computer Science and Data Science at New York University.

Pradeep Ravikumar, Carnegie Mellon University

Pradeep Ravikumar is a Professor in the Machine Learning Department, School of Computer Science at Carnegie Mellon University. He was previously an Associate Director at the Center for Big Data Analytics, at the University of Texas at Austin. His thesis has received honorable mentions in the ACM SIGKDD Dissertation award and the CMU School of Computer Science Distinguished Dissertation award. He is a Sloan Fellow, a Siebel Scholar, a recipient of the NSF CAREER Award, and was Program Chair for the International Conference on Artificial Intelligence and Statistics (AISTATS) in 2013. He is Associate Editor-in-Chief for IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), and action editor for the Machine Learning journal, and the Journal of Machine Learning Research.

Dr. Ravikumar's research group at CMU works on the foundations of statistical machine learning, with recent focus on "next generation" machine learning systems, that are explainable, robust to train and test time corruptions, resilient to distribution shifts, and are learnt under resource constraints by leveraging or discovering various notions of "structure" and domain knowledge.

[11:45 – 13:00] Lunch break

[13:00 – 14:30] Poster session

Links to the 96 accepted papers are on the virtual site and OpenReview.

For remote presenters, we will have a virtual poster session over Zoom at this link between 13:00-14:30 CST.

[15:30 - 15:45] Coffee break

[15:45 – 16:10] External Validity: Framework, Design, and Analysis

Erin Hartman, UC Berkeley

Erin Hartman is an Assistant Professor of Political Science at the University of California, Berkeley. Her research sits at the intersection of the social sciences and statistics. Her mission is to create a body of research that bridges these two worlds — with an emphasis on answering causal questions — within which experts from both worlds can have dialogue with one another and foster beneficial collaborations. In particular, she focuses on developing methods that allow researchers to generalize experimental results beyond the units, treatments, outcomes, and contexts upon which they were conducted. She also conducts work on survey design and analysis, informed by her time running the Analytics team's polling operation for President Obama's 2012 re-election campaign, and at her start-up BlueLabs. Erin's research in survey design and analysis focuses on statistical methods for leveraging detailed individual level data to overcome the non-random nature of modern survey data.

[16:10 – 16:35] Bringing real-world data to bear in addressing distribution shifts: a sociolinguistically-informed analysis of ASR errors

Alicia Wassink, University of Washington

Prof. Wassink is the Director of the Sociolinguistics Laboratory, and professor of Linguistics in the Department of Linguistics at the University of Washington. Her research interests lie in sociolinguistics (the study of language in its various social contexts, the relationships between language and social network structure, language attitudes and the outcomes of language and dialect contact) and phonetics (the study of the acoustic properties of spoken language, perception, and physiological aspects of human speech). One of Prof. Wassink’s principal languages of study is Jamaican Creole. When she is in Jamaica conducting fieldwork, her home base is the University of the West Indies.

[16:35 – 17:00] Geospatial Distribution Shifts in Ecology: Mapping the Urban Forest

Sara Beery, Google & MIT

Sara Beery will join MIT as an assistant professor in the Faculty of Artificial Intelligence and Decision-Making in EECS in September 2023 and is currently a visiting researcher at Google working on urban tree mapping across North America. She received her PhD in computing and mathematical sciences at Caltech in 2022, where she was advised by Pietro Perona. Her research focuses on building computer vision methods that enable global-scale environmental and biodiversity monitoring across data modalities, tackling real-world challenges including strong spatiotemporal correlations, imperfect data quality, fine-grained categories, and long-tailed distributions. She partners with nongovernmental organizations and government agencies to deploy her methods in the wild worldwide and works toward increasing the diversity and accessibility of academic research in artificial intelligence through interdisciplinary capacity building and education.

Organizers

Google Brain

Stanford University

ETH Zurich

Columbia University

RIKEN & University of Tokyo

University of Copenhagen

University of Washington & Google

Stanford University

Stanford University

Program Committee

Alexander Mangulad Christgau

A. Tuan Nguyen

Alex Xijie Lu

Alexander Robey

Alexandru Tifrea

Allan Zhou

Amartya Sanyal

Amita Kamath

Ananya Kumar

Andrew Ilyas

Anqi Liu

Austin J. Brockmeier

Byol Kim

Christopher Clark

Colin Wei

David Lopez-Paz

David Madras

Deyi Liu

Dustin Tran

Elan Rosenfeld

Elliot Creager

Eric Wallace

Erik Jones

Evgenia Rusak

Haoran Zhang

Haotian Ye

Huaxiu Yao

Ikko Yamane

Irena Gao

Ishaan Gulrajani

Jacob Clarysse

Jean Feng

Jeremiah Zhe Liu

Karthyek Murthy

Kibok Lee

Kuan-Hao Huang

Kuniaki Saito

Maksym Andriushchenko

Malte Londschien

Marvin Mengxin Zhang

Meng-Jiun Chiou

Michael Aerni

Michael Oberst

Michael Zhang

Mingsheng Long

Mitchell Wortsman

Mucong Ding

Neil Band

Nicholas R Galbraith

Nicola Gnecco

Nicolò Ruggeri

Nikola Konstantinov

Olivia Wiles

Peng Zhao

Polina Kirichenko

Prithvijit Chattopadhyay

Robert Geirhos

Robin Jia

Rohan Taori

Saeid Asgari

Samarth Mishra

Saminul Haque

Sang Michael Xie

Saurabh Garg

Shalmali Joshi

Siddharth Mysore

Steffen Schneider

Stephan Rabanser

Stephen Mussmann

Suyash Gupta

Takafumi Kanamori

Tan Minh Nguyen

Thao Nguyen

Tianle Cai

Tianyi Zhang

Tim G. J. Rudner

Tongtong Fang

Wataru Kumagai

Wieland Brendel

Xiang Lisa Li

Xiaochuang Han

Xinyang Chen

Yang Li

Yann Dubois

Yaodong Yu

Yifan Zhang

Yining Chen

Yoav Wald

Yu-Jie Zhang

Yujin Jeong

Zhongyi Pei