NeurIPS 2023 Workshop on Distribution Shifts (DistShift)

New Frontiers with Foundation Models

Friday, December 15th, 2023, New Orleans, USA

New Orleans Convention Center, Room R06-R09 (level 2)

Accepted Papers | Livestream

This workshop focuses on distribution shifts in the context of foundation models.

Distribution shifts—where a model is deployed on a data distribution different from what it was trained on—pose significant robustness challenges in real-world ML applications. Such shifts are often unavoidable in the wild and have been shown to substantially degrade model performance in applications such as biomedicine, wildlife conservation, sustainable development, robotics, education, and criminal justice. For example, models can systematically fail when tested on patients from different hospitals or people from different demographics. Training models that are robust to such distribution shifts is a rapidly growing area of interest in the ML community, and the goal of our workshop is to foster discussions and further research on distribution shifts.

In recent years, foundation models—large pretrained models that can be adapted for a wide range of tasks—have achieved unprecedented performance on a broad variety of discriminative and generative tasks, including in distribution shift scenarios. Foundation models open up an exciting new frontier in the study of distribution shifts, raising many open research questions:

Empirical trends. Foundation models can perform well under distribution shift—for instance, finetuned foundation models hold the state-of-the-art on several datasets in the WILDS benchmark of distribution shifts, although substantial gaps remain between in-distribution and out-of-distribution performance. What aspects of foundation models (e.g., pretraining data diversity, model scale, etc.) are driving this robustness? On what kinds of distribution shifts do these performance gains hold—e.g., are there shifts on which larger-scale models do more poorly?
Pretraining. Foundation models are pretrained on diverse corpora that typically do not reflect the data distribution of a downstream task, and this shift is particularly drastic for specialized applications (e.g., medical NLP). How does this pretraining distribution shift affect performance on downstream tasks? How can we mitigate it when pretraining foundation models?
Adaptation. For specialized tasks with poor few-shot performance, current foundation models must be adapted, e.g., by fine-tuning on a specialized dataset that differs significantly from the large pretraining dataset. However, prior work has shown that such fine-tuning can reduce the gains in distributional robustness that come from using foundation models, and these finetuned models incur substantial performance drops due to distribution shifts. What causes these phenomena, and how can we adapt models to downstream tasks without sacrificing robustness?
Generation. Distribution shifts have been largely studied in discriminative settings, but many foundation models have unprecedented generative capabilities. How do distribution shifts affect generative settings, e.g., if a model is used with prompts that are under-represented in the training data? How do we generate samples from a distribution of interest that differs from the pretraining distribution? How can we measure the effects of such shifts and mitigate them? And how can we leverage these generative capabilities to address distribution shifts in discriminative settings, e.g., through data augmentation?

Many of these questions of distribution shift are also key challenges for developing better foundation models. For example, foundation models are often adapted to be instruction-following and harmless using methods such as reinforcement learning from human feedback, and these are attempts to address the pretraining-to-downstream shift in a generative setting. Moreover, since today's foundation models are typically trained on data scraped from the Internet, adapting them to a broader set real-world applications (e.g., in biomedicine, conservation and sustainability, law, etc.) also requires grappling with the pretraining shift.

To this end, our workshop focuses on distribution shifts in the context of foundation models. We are broadly interested in methods, evaluations and benchmarks, and theory for distribution shifts, and we are especially interested in work that involve foundation models. If you have any questions, please contact us at distshift-workshop-2023@googlegroups.com.

[9:10 - 9:35] Towards Out-of-Distribution Generalization: Causality, Heterogeneity and Evaluation

Peng Cui, Tsinghua University

Peng Cui is an Associate Professor with tenure in Tsinghua University. He is interested in research on stable prediction, decision-making based on causal principles, and network representation learning at a large scale. Since 2016, he has been exploring how to combine causal statistics with machine learning methods, and developed a theoretical framework for stable learning inspired by causality. His research results have been widely used in industrial domains such as intelligent health care and the Internet economy. He has published more than 100 papers in top artificial intelligence conferences and received 7 awards for his papers from international conferences or journals. He is an associate editor of international journals such as IEEE TKDE, ACM TOMM, ACM TIST, IEEE TBD, KAIS, etc., and has been area chair or senior PC member of top conferences like NeurIPS, ICML, UAI ,etc. He has won the second prize of the National Natural Science Award in China, the first prize of the Natural Science Award of the Ministry of Education in China, the CCF-IEEE CS Young Scientist Award, and he is a distinguished scientist of ACM.

[9:35 - 10:00] Generalization in the Age of Foundation Models

Kate Saenko, Boston University

Kate is an AI Research Scientist at FAIR, Meta and a Full Professor of Computer Science at Boston University (currently on leave) where she leads the Computer Vision and Learning Group. Kate received a PhD in EECS from MIT and did postdoctoral training at UC Berkeley and Harvard. Her research interests are in Artificial Intelligence with a focus on out-of-distribution learning, dataset bias, domain adaptation, vision and language understanding, and other topics in deep learning.

[10:00-10:30] Coffee Break

[10:30-12:00] Poster Session

Links to the accepted papers are at the NeurIPS website and OpenReview.

[12:00-13:15] Lunch Break

[13:15-14:15] Spotlight Talks

TiC-CLIP: Continual Training of CLIP Models
Saurabh Garg, Mehrdad Farajtabar, Hadi Pouransari, Raviteja Vemulapalli, Sachin Mehta, Oncel Tuzel, Vaishaal Shankar, Fartash Faghri

LLM Routing with Benchmark Datasets
Tal Shnitzer, Anthony Ou, Mírian Silva, Kate Soule, Yuekai Sun, Justin Solomon, Neil Thompson, Mikhail Yurochkin

Does CLIP’s generalization performance mainly stem from high train-test similarity?
Prasanna Mayilvahanan, Thaddäus Wiedemer, Evgenia Rusak, Matthias Bethge, Wieland Brendel

Domain constraints improve risk prediction when outcome data is missing
Sidhika Balachandar, Nikhil Garg, Emma Pierson

OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection
Jingyang Zhang, Jingkang Yang, Pengyun Wang, Haoqi Wang, Yueqian Lin, Haoran Zhang, Yiyou Sun, Xuefeng Du, Yixuan Li, Ziwei Liu, Yiran Chen, Hai Li

SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
Sewon Min, Suchin Gururangan, Eric Wallace, Weijia Shi, Hannaneh Hajishirzi, Noah A. Smith, Luke Zettlemoyer

[14:15-14:40] Robust Machine Learning With Foundation Models

Aditi Raghunathan, Carnegie Mellon University

Aditi Raghunathan is an Assistant Professor at Carnegie Mellon University. She is interested in building robust ML systems with guarantees for trustworthy real-world deployment. Previously, she was a postdoctoral researcher at Berkeley AI Research, and received her PhD from Stanford University in 2021. Her research has been recognized by awards such as Forbes 30 under 30, the Schmidt AI2050 Early Career Fellowship, the Arthur Samuel Best Thesis Award at Stanford, a Google PhD fellowship in machine learning, and an Open Philanthropy AI fellowship.

[14:40-15:05] Advancing Health at the Speed of AI

Hoifung Poon, Microsoft Research

Hoifung Poon is General Manager at Health Futures in Microsoft Research and an affiliated faculty at the University of Washington Medical School. He leads biomedical AI research and incubation, with the overarching goal of structuring medical data to optimize delivery and accelerate discovery for precision health. His team and collaborators are among the first to explore large language models (LLMs) in health applications. His research produces popular open-source foundation models such as PubMedBERT, BioGPT, BiomedCLIP, LLaVA-Med. He has led successful research partnerships with large health providers and life science companies, creating AI systems in daily use for applications such as molecular tumor board and clinical trial matching. He has given tutorials on these topics at top AI conferences such as ACL, AAAI, and KDD, and his prior work has been recognized with Best Paper Awards from premier AI venues such as NAACL, EMNLP, and UAI. He received his PhD in Computer Science and Engineering from the University of Washington, specializing in machine learning and NLP.

[15:05-15:30] Coffee Break

[15:30-16:00] Invited Talk by Ludwig Schmidt

Ludwig Schmidt, University of Washington

Ludwig Schmidt is an assistant professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington. Ludwig’s research interests revolve around the empirical foundations of machine learning, often with a focus on datasets, reliable generalization, and large models. Recently, Ludwig’s research group contributed to open source machine learning by creating OpenCLIP, OpenFlamingo, and the LAION-5B dataset. Ludwig completed his PhD at MIT and was a postdoc at UC Berkeley. Ludwig’s research received a new horizons award at EAAMO, best paper awards at ICML & NeurIPS, a best paper finalist at CVPR, and the Sprowls dissertation award from MIT.