The EnCORE Workshop on Theoretical Perspectives on Large Language Models (LLMs) explores foundational theories and frameworks underlying the architecture, learning mechanisms, and capabilities of large language models. This workshop brings together researchers to discuss recent advancements, theoretical challenges, and emerging concepts in understanding and predicting LLM behavior, efficiency, generalization, and alignment with human intent. Topics include mathematical modeling, interpretability, limitations, and innovative theoretical tools to deepen insight into LLM capabilities and constraints. Sponsored by NSF and Google AI.
Confirmed Participants
Columbia University
Google Deepmind
University of California San Diego
University of California, Berkeley
Stanford University
University of Pennsylvania
Columbia University
University of Southern California
University of Wisconsin-Madison
Toyota Technological Institute at Chicago (TTIC)
Stanford University
UCLA
Massachusetts Institute of Technology
Research Scientist at Apple
Stanford University
University of Buffalo
University of Texas at Austin
University of California Berkeley
University of Southern California
University of British Columbia
University of California San Diego
Schedule (Mar 3-5, 2021)
All times are in Pacific Time (GMT-7). Location: EnCORE Institute, Atkinson Hall: 4th Floor
Mar 3
7:30-8:15 am Breakfast in Hotel
8:45 am Opening Remarks
Morning Session: Chair: Sanjoy Dasgupta
9:00 am Kangwook Lee: Beyond Decoder-Only Next Token Prediction
9:45 am Ankur Moitra: Model Stealing for Low Rank Language Models
10:30 - 10:45 am Break
10:45 am Sujay Sanghavi: Mitigating catastrophic forgetting in the data-oblivious setting
11:30 am Tatsu Hashimoto: Statistical perspectives on LLM pretraining data
12:00 - 2:00 pm Lunch
Afternoon Session: Chair: David Woodruff
2:00 pm Daniel Hsu: Transformers, parallelism, and the role of depth
2:45 pm Preetum Nakkiran: What Algorithms can Transformers Learn? A Study in Length Generalization
3:30 pm Hamed Hassani: How to Optimally Quantify Uncertainty for Risk-Averse Agents?
4:00 - 4:15 pm Break
Student Lightning Talks: Chair: Rina Panigrahy
4:15 - 5:00 pm
Bhavya Vasudeva: Transformers Learn Low Sensitivity Functions: Investigations and Implications
Ali Kavis: Understanding Self-supervised Learning via Gaussian Mixture Models
Themistoklis Haris: Compression Barriers in Autoregressive Transformers
Yilan Chen: Tight Generalization Bound of Gradient Flow through Training Trajectory
Christopher Ye: I/O Complexity of Attention, How Optimal is FlashAttention?
Tuomas Oikarinen: Towards Automated Mechanistic Interpretability
Ge Yan: VLG-CBM: Learning Faithful Concept Bottleneck Models Beyond LLM Assistance
Chung-En Sun: Adv-LLM: Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities
Mahdi Sabbaghi: Adversarial Reasoning at Jailbreaking Time
Mar 4
7:30-8:30 am Breakfast in Hotel
Morning Session: Chair: Rina Panigrahy
9:00 am Atri Rudra: An Arithmetic Circuit Lens on Deep Learning Architectures
9:45 am Vatsal Saran: Using Algorithms to Understand Transformers (and Using Transformers to Understand Algorithms)
10:30 - 10:45 am Break
10:45 am Rajesh Jayaram: Multi-Vector Representations and Embedding-Based Nearest Neighbor Search
11:30 pm Christos Thrumpoulidis: Implicit Geometry of Next-token Prediction
12:00 - 2:00 pm Lunch
Afternoon Session: Chair: David Woodruff
2:00 pm Josh Alman: Fine-Grained Complexity and the Pursuit of Fast Attention
2:45 pm Tengyu Ma: STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving
3:30 pm Xiang Cheng: Graph Transformers Dream of Electric Flow
4:00 pm - 6:00 pm Hiking/Excursions
Mar 5
7:30-8:30 am Breakfast in Hotel
Morning Session: Chair: Sanjoy Dasgupta
9:00 am Misha Belkin: The linear representation hypothesis for controlling and steering LLMs
9:45 am Anant Sahai: A Toy Model For Asymptotic Weak to Strong Generalization Leveraging Benign-Overfitting/Harmless-Interpolation Ideas
10:30 - 11:00 am Break
11:00 am Yu-Xiang Wang: Flatness, Sparsity and Generalization by Large-Learning Rate
11:30 am Zhiyuan Li: Weak-to-Strong Generalization Even in Random Feature Networks, Provably
12:00 - 2:00 pm Lunch
Afternoon Session: Chair: Arya Mazumdar
2:00 pm Ankit Singh Rawat: A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
2:30 pm Adel Javanmard: DeepCrossAttention: Supercharging Transformer Residual Connections
3:00 pm Ahmad Beirami: Language Model Alignment: Theory & Practice
3:30 - 3:45 pm Break
3:45 pm Open Problems Session: Chair: Arya Mazumdar
Logistics
Location
We are hosted by the Institute for Emerging CORE Methods in Data Science (EnCORE) at University of California San Diego, on the 4th floor of Atkinson Hall. The address is:
3235 Voigt Dr, La Jolla, CA 92093
The recommended rideshare drop-off location is Parking Lot P503. Please do not park in this parking lot as you will likely be ticketed and/or towed.
Parking
UCSD offers both metered spaces and permit-only parking lots. Parking can be limited, so we encourage the use of public transportation when possible.
Visitor Parking Information (how to pay)
Accessible parking (for visitors with a Disabled Person placard or license plate)
The Hopkins Parking Structure is located at 10100 Hopkins Dr., La Jolla, CA 92093. Please allow 20-30 minutes to park and walk to the event venue.
Lodging
Attendees can book a nearby AirBnB, or the closest hotels are:
Organizers
University of California San Diego
University of California San Diego
Google Research
University of California San Diego
Carnegie Mellon University