ICLR 2024 Workshop on Navigating and Addressing 

Data Problems for Foundation Models

(DPFM)

May 11, 2024, Saturday https://iclr.cc/virtual/2024/workshop/20585

Messe Wien Exhibition Congress Center, Vienna, Austria + Zoom | OpenReview

 [Updated May-3: Checkout ICLR Event Page for Up-to-date Program and Schedules 🔗 ] 

 [Updated May-9: Checkout Our Workshop Schedule Page 📅⏰

 [Room: Stolz 0 🍪☕] Looking forward



Cover image generated with DALL·E. Prompt: "This creative portrayal shows an artist painting a canvas with streams of data, symbolizing a data scientist transforming raw data into meaningful patterns. The background is a modern laboratory, blending art, science, and technology in machine learning."

OVERVIEW

Foundation Models (FMs, e.g., GPT-3/4, LLaMA, DALL-E, Stable Diffusion, etc.) have demonstrated unprecedented performance across a wide range of downstream tasks. Following the rapid evolution, as researchers strive to keep up with the understanding of the capabilities and limitations of FMs as well as their implications, attention is now shifting to the emerging notion of data-centric AI.

Curation of training data is crucially important for the performance and reliability of FMs and a wealth of recent works demonstrate that data-perspective research sheds light on a promising direction toward critical issues such as safety, alignment, efficiency, security, privacy, interpretability, etc. 

To move forward, this workshop aims to discuss and explore a better understanding of the new paradigm for research on data problems for foundation models.


We look forward to meeting communities and researchers on data problems (e.g., data-centric AI, dataset/data curation, data market), foundation models (alignment, safety/trustworthiness, fairness/ethics), practitioners of downstream applications, tech companies providing innovative solutions, and beyond! We strive to build a community behind this essential topic and provide the platform to connect, share ideas, explore for consensus, and create collaboration opportunities.


Our technical agenda is composed of four modules.

TECHNICAL AGENDA

[Module A] Data Quality, Dataset Curation, and Data Generation

[Module B] A Data Perspective to Efficiency, Interpretability, and Alignment

[Module C] A Data Perspective to Safety and Ethics–Risks, Limitations, and Opportunities

[Module D] Copyright, Legal Issues, and Data Economy–A Broader Landscape

Program 

Please check out Schedule page for details of the topic/introduction of Invited Talks and Best Paper Presentations and biography of the speakers.

Up-to-date programs and schedules can be found at ICLR Event Page: https://iclr.cc/virtual/2024/workshop/20585

Accepted Papers

List of all accepted works with manuscripts/posters can be found at https://iclr.cc/virtual/2024/workshop/20585 

 

SPEAKERS and PANELISTS

(alphabetical order)

Google 

Hanna Hajishirzi

U Washington & AI2 

Mike Lewis

Meta FAIR 

Ludwig Schmidt

U Washington 

Eric Wallace

OpenAI 

U Washington & Meta 

 See Schedule Page for introductions of the speakers and panelists.

 

ORGANIZERS

Ruoxi Jia 

Assistant Professor

Virginia Tech 

Tatsunori  Hashimoto 

Assistant Professor

Stanford University 

Pang Wei Koh

Assistant Professor

University of Washington 

Jerone Andrews 

Research Scientist

Sony AI 

Sang Michael  Xie

PhD Student

Stanford University 

Lingjiao Chen

PhD Student

Stanford University 

Myeongseob Ko 

PhD Student

Virginia Tech 

Feiyang Kang

PhD Student

Virginia Tech 

 

QUESTIONS?

If you have any questions, feel free to contact us

 dpfm-workshop-iclr24@googlegroups.com