The increasing computational demands of modern ML create a critical challenge: thorough experimentation becomes prohibitively expensive precisely when we most need to understand and steer model behavior. Small-scale experiments offer a powerful approach for systematic investigation, enabling both scientific understanding and practical advances. Recent work demonstrates the endless opportunities at this scale, including: diagnoses and mitigations of training pathologies, minimalistic replications of modern pipelines, elementary synthetic tasks that "stress test" architectures and motivate new designs and discovery of intriguing phenomena. This workshop highlights how methods and opportunities at small scale can unlock new insights and drive progress.
Time: July 19th, 2025 (Sat)
Location: West Ballroom B, Vancouver convention center.
(9:00--9:10) Opening Remarks
(9:10--10:45) Invited Talks
Aditi Raghunathan -- Title: Beyond benchmarks: the case for spherical cows in LLM research
Tri Dao -- Title: Designing Efficient Attention: Insights from an Inference Perspective
(10:45--11:45) Poster Session 1 (List of accepted papers)
(11:45--12:30) Contributed Talks and Demos (3 x 15min)
Title: Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning (Openreview Link)
Authors: Xinyi Wang, Shawn Tan, Mingyu Jin, William Yang Wang, Rameswar Panda, Yikang Shen
Title: Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought (Openreview Link)
Authors: Hanlin Zhu, Shibo Hao, Zhiting Hu, Jiantao Jiao, Stuart Russell, Yuandong Tian
Title: Stats or Facts: Decomposing Generalization in Language Models with Small-Scale Models (Openreview link)
Authors: Tina Behnia, Puneesh Deora, Christos Thrampoulidis
(12:30--13:30) Lunch break
(13:30--15:00) Invited Talks
Eric Wong -- Title: How Jailbreaking 1-Layer Transformers Taught us how to Steer LLMs
Yejin Choi -- Title: The Art of Artificial Reasoning for Small Language Models
(15:00--15:45) Contributed Talks and Demos (3 x 15min)
Title: Dataset Distillation for Memorized Data: Soft Labels can Leak Held-Out Teacher Knowledge (Openreview Link)
Authors: Freya Behrens, Lenka Zdeborova
Title: In-Context Occam’s Razor: How Transformers Prefer Simpler Hypotheses on the Fly (Openreview link)
Authors: Puneesh Deora, Bhavya Vasudeva, Tina Behnia, Christos Thrampoulidis
Title: Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models (Openreview link)
Authors: Lillian Sun, Martin Pawelczyk, Zhenting Qi, Aounon Kumar, Himabindu Lakkaraju
(15:45--16:30) Panel Discussion
With Misha Belkin, Stella Biderman, Nimit Kalra, Yejin Choi, Christos Thrampoulidis, Aditi Raghunathan.
(16:30--17:15) Poster Session 2 (List of accepted papers)
Contact: icml2025-moss-workshop [at] googlegroups [dot] com