Paper presentations and projects: schedule & sign up sheet
Deadlines:
Oct  6 - project proposals due
TBA - final project submission due
Lecturer: Irina Rish
Topic: Intro and Overview: A brief history of AI at Scale (slides, video)
Papers: The Bitter Lesson, GPT-3 paper: Language Models are Few-Shot Learners
Topic: Intro and Overview: Continual Learning at Scale (slides, video)
Lecturer: Irina Rish
Topic: Overview of Papers to Present and Some Projects Topics (video-part1, video-part2 )
Class materials: some of the previous Topics & Papers (focus on: Continual Learning at Scale, Alignment and Safety, Emergence, Phase Transitions and Stat Physics of ML), Some previous large-scale projects, Towards Time_Series Foundation Models
Topic: Scaling Laws for Neural Language Models (slides, video)
Also covered: Training Compute-Optimal Large Language Models (Chinchilla Explained: video), Emergent Abilities of Large Language Models, Are emergent abilities of LLMs a Mirage? Additional materials: Neural Scaling Laws and GPT-3 (video); a nice overview of the history of scaling laws: Scaling Laws for LLMs: from GPT-3 to o3
Topic: An Empirical Model of Large-Batch Training (slides, video)
Topic: DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning (slides, video - part 2, starting at 01:04:29)
Topic: LoRA: Low-Rank Adaptation of Large Language Models (slides, video)
Topic: Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts (slides, video)
Topic: Persona Vectors: Monitoring and Controlling Character Traits in Language Models (slides, video)
Topic: VinePPO: Refining Credit Assignment in RL Training of LLMs (slides, video)
Topic: Effect of scale on catastrophic forgetting in neural networks (slides, video)
Topic: VGGT: Visual Geometry Grounded Transformer (slides, video)
Topic: Scaling Laws for Transfer (slides, video)
Topic: On the Biology of a Large Language Model (slides, video)
Topic: The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery (slides, video)
Topic: Simple and Scalable Strategies to Continually Pre-train Large Language Models (slides, video)
Topic: The Platonic Representation Hypothesis (slides, video)
Topic: The Ultra-Scale Playbook: Training LLMs on GPU Clusters (slides, video)
Topic: Evaluating Large Language Models Trained on Code (slides, video)
Topic: Hierarchical Reasoning Model (slides, video)
Topic: Training Compute-Optimal Protein Language Models (slides, video)
Topic: K2-Think: A Parameter-Efficient Reasoning System (slides, video)
Topic: Muon is Scalable for LLM Training (slides, video - part 2, from 1:51:54)
Topic: Scaling Laws For Dense Retrieval (slides, video)
Topic: Alignment faking in large language models (slides, video)
Topic: Subliminal Learning: Language models transmit behavioral traits via hidden signals in data (slides, video)
Topic: On the Theoretical Limitations of Embedding-Based Retrieval (slides, video)