Statistics has historically been the tool of choice for understanding and mitigating the operational risks of engineering deployments. However, foundation models such as LLMs are massive black-box prediction and generation systems whose inner workings are often inaccessible behind an API endpoint. This has caused substantial concern regarding reliability, auditing, privacy, safety, and more, since traditional statistical tools often require knowledge of the model.
This workshop is motivated by the need for new statistical tools for the era of black-box models. To address this need, this workshop aims to bring together a diverse group of attendees to discuss the opportunities and challenges for statistical methods posed by foundation models. Newcomers to these topics are welcome. The goal of the workshop is to foster an inclusive and forward-looking dialogue on areas of recent development, including but not limited to:
Benchmarks. How can we overcome the limitations of traditional static benchmarks to evaluate the performance of foundation models, such as against human preferences, in a principled way?
Bias. What statistical tools are best brought to bear in measuring and correcting biases in language models?
Automatic evaluation. What are statistically principled scoring methods and algorithms for using LLMs as evaluative agents (e.g., LLM-as-a-judge)?
Watermarking. Can we design watermarking schemes and associated methods for identifying AI-generated content that enable rigorous theoretical treatment, as opposed to relying on heuristics?
Conformal prediction and black-box uncertainty quantification. How can methods for achieving principled uncertainty and risk quantification under minimal assumptions best be brought to bear on problems like hallucination and uncertainty communication with LLMs? What risks do these approaches carry?
Privacy and data rights. How can we ensure to protect users from data misuse and properly compensate them for the value of their data?
Auditing, safety, and risk analysis. How do we build procedures for testing the failure cases and safety risks of models?
Submission link: https://openreview.net/group?id=NeurIPS.cc/2024/Workshop/SFLLM
Submissions can be of any length, and in any format. Everyone is welcome to submit, including early-stage work and short papers.
Submissions will be reviewed based on whether the workshop attendees will find it interesting (a combination of relevance and submission quality).
Dual-submissions to the NeurIPS main conference and the Workshop are permitted and encouraged. This is a non-archival venue with no proceedings.
Accepted papers will receive a talk or a poster at the workshop and the abstracts will appear on the NeurIPS website. Authors can elect to link this abstract to a version of their paper hosted elsewhere (e.g., arXiv, or a personal server).
Accepted papers will not have a synchronous virtual presentation option.
Important Dates:
Submission deadline: September 27th, AOE. EXTENDED!!!
Acceptance notification: October 9th.
Camera-ready deadline for spotlight video submission for spotlight talk recordings: December 10th.
Poster printing information: https://neurips.cc/Conferences/2024/PosterInstructions
For questions, contact sfllmworkshop@gmail.com