Mar 3-5, 2025

Workshop on Theoretical Perspectives on LLMs

an EnCORE Institute Workshop

The EnCORE Workshop on Theoretical Perspectives on Large Language Models (LLMs) explores foundational theories and frameworks underlying the architecture, learning mechanisms, and capabilities of large language models. This workshop brings together researchers to discuss recent advancements, theoretical challenges, and emerging concepts in understanding and predicting LLM behavior, efficiency, generalization, and alignment with human intent. Topics include mathematical modeling, interpretability, limitations, and innovative theoretical tools to deepen insight into LLM capabilities and constraints. Sponsored by NSF and Google AI.

Confirmed Participants

Josh Alman

Columbia University

Ahmed Beirami

Google Deepmind

Misha Belkin

University of California San Diego

Xiang Cheng

University of California, Berkeley

Tatsunori Hashimoto

Stanford University

Hamed Hassani

University of Pennsylvania

Daniel Hsu

Columbia University

Adel Javanmard

University of Southern California

Rajesh Jayaram

Google

Ravi Kumar

Google

Kangwook Lee

University of Wisconsin-Madison

Zhiyuan Li

Toyota Technological Institute at Chicago (TTIC)

Tengyu Ma

Stanford University

Raghu Meka

UCLA

Ankur Moitra

Massachusetts Institute of Technology

Preetum Nakkiran

Research Scientist at Apple

Omer Reingold

Stanford University

Atri Rudra

University of Buffalo

Sujay Sanghavi

University of Texas at Austin

Ankit Singh Rawat

Google

Anant Sahai

University of California Berkeley

Vatsal Sharan

University of Southern California

Christos Thrampoulidis

University of British Columbia

Yu-Xiang Wang

University of California San Diego

Schedule (Mar 3-5, 2021)

All times are in Pacific Time (GMT-7). Location: EnCORE Institute, Atkinson Hall: 4th Floor

Mar 3

7:30-8:15 am Breakfast in Hotel

8:45 am Opening Remarks

Morning Session: Chair: Sanjoy Dasgupta

9:00 am Kangwook Lee: Beyond Decoder-Only Next Token Prediction

9:45 am Ankur Moitra: Model Stealing for Low Rank Language Models

10:30 - 10:45 am Break

10:45 am Sujay Sanghavi: Mitigating catastrophic forgetting in the data-oblivious setting

11:30 am Tatsu Hashimoto: Statistical perspectives on LLM pretraining data

12:00 - 2:00 pm Lunch

Afternoon Session: Chair: David Woodruff

2:00 pm Daniel Hsu: Transformers, parallelism, and the role of depth

2:45 pm Preetum Nakkiran: What Algorithms can Transformers Learn? A Study in Length Generalization

3:30 pm Hamed Hassani: How to Optimally Quantify Uncertainty for Risk-Averse Agents?

4:00 - 4:15 pm Break

Student Lightning Talks: Chair: Rina Panigrahy

4:15 - 5:00 pm

Bhavya Vasudeva: Transformers Learn Low Sensitivity Functions: Investigations and Implications
Ali Kavis: Understanding Self-supervised Learning via Gaussian Mixture Models
Themistoklis Haris: Compression Barriers in Autoregressive Transformers
Yilan Chen: Tight Generalization Bound of Gradient Flow through Training Trajectory
Christopher Ye: I/O Complexity of Attention, How Optimal is FlashAttention?
Tuomas Oikarinen: Towards Automated Mechanistic Interpretability
Ge Yan: VLG-CBM: Learning Faithful Concept Bottleneck Models Beyond LLM Assistance
Chung-En Sun: Adv-LLM: Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities
Mahdi Sabbaghi: Adversarial Reasoning at Jailbreaking Time

Mar 4

7:30-8:30 am Breakfast in Hotel

Morning Session: Chair: Rina Panigrahy

9:00 am Atri Rudra: An Arithmetic Circuit Lens on Deep Learning Architectures

9:45 am Vatsal Saran: Using Algorithms to Understand Transformers (and Using Transformers to Understand Algorithms)

10:30 - 10:45 am Break

10:45 am Rajesh Jayaram: Multi-Vector Representations and Embedding-Based Nearest Neighbor Search

11:30 pm Christos Thrumpoulidis: Implicit Geometry of Next-token Prediction

12:00 - 2:00 pm Lunch

Afternoon Session: Chair: David Woodruff

2:00 pm Josh Alman: Fine-Grained Complexity and the Pursuit of Fast Attention

2:45 pm Tengyu Ma: STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving

3:30 pm Xiang Cheng: Graph Transformers Dream of Electric Flow

4:00 pm - 6:00 pm Hiking/Excursions

Mar 5

7:30-8:30 am Breakfast in Hotel

Morning Session: Chair: Sanjoy Dasgupta

9:00 am Misha Belkin: The linear representation hypothesis for controlling and steering LLMs

9:45 am Anant Sahai: A Toy Model For Asymptotic Weak to Strong Generalization Leveraging Benign-Overfitting/Harmless-Interpolation Ideas

10:30 - 11:00 am Break

11:00 am Yu-Xiang Wang: Flatness, Sparsity and Generalization by Large-Learning Rate

11:30 am Zhiyuan Li: Weak-to-Strong Generalization Even in Random Feature Networks, Provably

12:00 - 2:00 pm Lunch

Afternoon Session: Chair: Arya Mazumdar

2:00 pm Ankit Singh Rawat: A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs

2:30 pm Adel Javanmard: DeepCrossAttention: Supercharging Transformer Residual Connections

3:00 pm Ahmad Beirami: Language Model Alignment: Theory & Practice

3:30 - 3:45 pm Break

3:45 pm Open Problems Session: Chair: Arya Mazumdar

Register to Attend

Logistics

Location

We are hosted by the Institute for Emerging CORE Methods in Data Science (EnCORE) at University of California San Diego, on the 4th floor of Atkinson Hall. The address is:

3235 Voigt Dr, La Jolla, CA 92093

The recommended rideshare drop-off location is Parking Lot P503. Please do not park in this parking lot as you will likely be ticketed and/or towed.

Parking

UCSD offers both metered spaces and permit-only parking lots. Parking can be limited, so we encourage the use of public transportation when possible.