The Schedule

Saturday, December 14th, 2024

Morning Session

9:20 to 12:00

9:20 - 9:30

Opening Remarks

Organizing Committee

9:30 - 10:15

Invited talk #1: Bernhard Schölkopf

10:15 - 11:00

Unstructured Time

Attendees can meet and converse.

11:00 - 11:15

Coffee Break

11:15 - 12:00

Invited talk #2: Mihaela van der Schaar

12:00-12:45

Poster session #1

Lunch Break

12:45 to 14:00

Afternoon session

14:00 to 16:50

14:00-14:45

Invited talk #3: Weijie Su

Title: Statistical Insights for LLMs: Two Examples in Alignment and Watermarking

Large language models (LLMs) have rapidly emerged as a transformative innovation in machine learning. However, their increasing influence on human decision-making processes raises critical societal questions. In this talk, we will demonstrate how statistics can help address two key challenges: ensuring fairness for minority groups through alignment and combating misinformation through watermarking. First, we tackle the challenge of creating fair LLMs that equitably represent and serve diverse populations. We derive a regularization term that is both necessary and sufficient for aligning LLMs with human preferences, ensuring equitable outcomes across different demographics. Second, we introduce a general statistical framework to analyze the efficiency of watermarking schemes for LLMs. We develop optimal detection rules for an important watermarking scheme recently developed at OpenAI and empirically demonstrate its superiority over the existing detection method. Throughout the talk, we will showcase how statistical insights can not only address pressing challenges posed by LLMs but also unlock substantial opportunities for the field of statistics to drive responsible generative AI development. This talk is based on arXiv:2405.16455, 2404.01245, and 2411.13868

14:45-15:00

Coffee Break #2

15:00-15:45

Invited talk #4: Virginia Smith

Title: Getting Lost in ML Safety Vibes

Machine learning applications are increasingly reliant on black-box pretrained models. To ensure safe use of these models, techniques such as unlearning, guardrails, and watermarking have been proposed to curb model behavior and audit usage. Unfortunately, while these post-hoc approaches give positive safety ‘vibes’ when evaluated in isolation, our work shows that existing techniques are quite brittle when deployed as part of larger systems. In a series of recent works, we show that: (a) small amounts of auxiliary data can be used to 'jog' the memory of unlearned models; (b) current unlearning benchmarks obscure deficiencies in both finetuning and guardrail-based approaches; and (c) simple, scalable attacks erode existing LLM watermarking systems and reveal fundamental trade-offs in watermark design. Taken together, these results highlight major deficiencies in the practical use of post-hoc ML safety methods. We end by discussing promising alternatives to ML safety, which instead aim to ensure safety by design during the development of ML systems.

15:45-16:30

Poster session #2

16:30-16:45

Closing Remarks

Page updated

Report abuse