Deep learning systems have revolutionized field after another, leading to unprecedented empirical performance. Yet, their intricate structure led most practitioners and researchers to regard them as blackboxes, with little that could be understood. In this course, we will review experimental and theoretical works aiming to improve our understanding of modern deep learning systems.
[20%]: Attendance and participation
[40%]: Tiny PyTorch coding exercises
[40%]: Final project on paper of your choosing
Theories on Architectures
Transformers as Dynamical Systems (paper)
Safety and Alignment
Jailbreaking (paper)
Choose a paper
On the topic of theoretical or empirical investigation of deep learning.
You can consult me about your choice during office hours, via email, or through other communication channels.
Submit a two-page report
PDF format, 1-inch margins, font size 10pt, preferably typed in Latex.
1 page summarizing the paper.
1 page proving a novel theoretical result or proposing and implementing a novel experiment. You are encouraged to build your experiment on open-source implementations, if available.
Deadline is last day of the semester.
Present the report in the final lectures
Using slides (keynote, google slides, powerpoint, beamer, or other similar tools).
5 minute presentation of paper summary and your novelty followed by 1 minute of questions.
Pairs:
Twice longer report.
10 minutes of presentation followed by 2 minutes of questions.
Research papers by Anthropic:
Other papers:
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Generalization in Diffusion Models Arises From Geometry-Adaptive Harmonic Representations
LoRA Training in the NTK Regime has No Spurious Local Minima
Transformers Learn In-Context by Gradient Descent
Locating and Editing Factual Associations in GPT
A Kernel-Based View of Language Model Fine-Tuning
Average gradient outer product as a mechanism for deep neural collapse
The Remarkable Robustness of LLMs: Stages of Inference?
Do Language Models Use Their Depth Efficiently?
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
The Low-Rank Simplicity Bias in Deep Networks
Understanding Transformer from the Perspective of Associative Memory
Transformers need glasses! Information over-squashing in language tasks
The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Superposition Yields Robust Neural Scaling
Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness
Transformers Struggle to Learn to Search
Layer by Layer: Uncovering Hidden Representations in Language Models
Do We Always Need the Simplicity Bias? Looking for Optimal Inductive Biases in the Wild
Investigating the Catastrophic Forgetting in Multimodal Large Language Models
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers
Layers at Similar Depths Generate Similar Activations Across LLM Architectures
A Theory of Learning with Autoregressive Chain of Thought
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
The Super Weight in Large Language Models
Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation
Understanding How Nonlinear Layers Create Linearly Separable Features for Low-Dimensional Data
Mind the Gap: a Spectral Analysis of Rank Collapse and Signal Propagation in Attention Layers
s1: Simple test-time scaling