We invite researchers working on evaluation of large language models to submit their work for consideration to the LLM-eval workshop. We welcome submissions that make contributions on topics including, but not limited to:
Analysis of existing evaluation metrics or new metric proposals
Holistic evaluation
Evaluating potential risks of LLMs
Benchmarking and standardization of evaluation protocols
Pre-training, supervised fine-tuning, and post-training evaluations, including RLHF and human-in-the-loop assessments
Interrelations and dependencies between different evaluation stages
Scaling laws
Emergent abilities
Data contamination
Memorization
Authors should upload a short paper of up to four pages in NeurIPS format, with unlimited pages of references and supplementary material. Note that the reviewers are not required to read the appendices and the claims of the paper should be supported by material in the main four page body. Please submit a single PDF that includes the main paper and supplementary material. We welcome submissions that present work which is unpublished or currently under submission. We will also consider recently published papers (i.e., in 2025).
All submissions will be reviewed in a double-blind process and will be evaluated on the basis of their technical content and relevance to the workshop. Accepted papers will be selected to be presented either in a poster session or as a contributed talk. This workshop is non-archival and submissions can be submitted to other venues. The accepted papers will be publicly available through openreview before the start of the workshop.
Submit your papers at OpenReview.
Full Paper Submission Deadline: September 4th, 2025, 11:59 pm AoE
Accept/Reject Notification Date: September 21st, 2025, 11.59pm AoE
Workshop Date: December 7th (San Diego)
If you would like to volunteer as a reviewer, please fill out this form.