ICLR 2024 Workshop on Navigating and Addressing 

Data Problems for Foundation Models

(DPFM)

May 11, 2024, Saturday  [ !ATTN: Submission ddl extended to Feb 11 (AoE) ] 

Messe Wien Exhibition Congress Center, Vienna, Austria + Zoom | OpenReview

Cover image generated with DALL·E. Prompt: "This creative portrayal shows an artist painting a canvas with streams of data, symbolizing a data scientist transforming raw data into meaningful patterns. The background is a modern laboratory, blending art, science, and technology in machine learning."

OVERVIEW

Foundation Models (FMs, e.g., GPT-3/4, LLaMA, DALL-E, Stable Diffusion, etc.) have demonstrated unprecedented performance across a wide range of downstream tasks. Following the rapid evolution, as researchers strive to keep up with the understanding of the capabilities and limitations of FMs as well as their implications, attention is now shifting to the emerging notion of data-centric AI.

Curation of training data is crucially important for the performance and reliability of FMs and a wealth of recent works demonstrate that data-perspective research sheds light on a promising direction toward critical issues such as safety, alignment, efficiency, security, privacy, interpretability, etc. 

To move forward, this workshop aims to discuss and explore a better understanding of the new paradigm for research on data problems for foundation models.


We look forward to meeting communities and researchers on data problems (e.g., data-centric AI, dataset/data curation, data market), foundation models (alignment, safety/trustworthiness, fairness/ethics), practitioners of downstream applications, tech companies providing innovative solutions, and beyond! We strive to build a community behind this essential topic and provide the platform to connect, share ideas, explore for consensus, and create collaboration opportunities.


Our technical agenda is composed of four modules.

TECHNICAL AGENDA

[Module A] Data Quality, Dataset Curation, and Data Generation

[Module B] A Data Perspective to Efficiency, Interpretability, and Alignment

[Module C] A Data Perspective to Safety and Ethics–Risks, Limitations, and Opportunities

[Module D] Copyright, Legal Issues, and Data Economy–A Broader Landscape

IMPORTANT DATES

[ !ATTN: Submission ddl extended to Feb 11 (AoE) ] 

 

Confirmed SPEAKERS and PANELISTS

(alphabetical order)

This list is being actively updated as we confirm with more speakers and panelists

Google 

Hanna Hajishirzi

U Washington & AI2 

Mike Lewis

Meta FAIR 

Haifeng Xu

University of Chicago

U Washington & Meta 

 See Schedule Page for introductions of the speakers. We will also provide the topic and abstract of each talk on Schedule Page as soon as they become available–stay tuned!

 

ORGANIZERS

Ruoxi Jia 

Assistant Professor

Virginia Tech 

Tatsunori  Hashimoto 

Assistant Professor

Stanford University 

Pang Wei Koh

Assistant Professor

University of Washington 

Jerone Andrews 

Research Scientist

Sony AI 

Sang Michael  Xie

PhD Student

Stanford University 

Lingjiao Chen

PhD Student

Stanford University 

Myeongseob Ko 

PhD Student

Virginia Tech 

Feiyang Kang

PhD Student

Virginia Tech 

 

QUESTIONS?

If you have any questions, feel free to contact us

 dpfm-workshop-iclr24@googlegroups.com