Paper presentations and projects: schedule & sign up sheet
Deadlines:
Oct 6 - Project proposal submission due
Dec 5, 2025, at 11:59 PM - Final project submission due
Poster Session is on Monday, November 24th, 2 pm - 5 pm
Lecturer: Irina Rish
Topic: Intro and Overview: A brief history of AI at Scale (slides, video)
Papers: The Bitter Lesson, GPT-3 paper: Language Models are Few-Shot Learners
Topic: Intro and Overview: Continual Learning at Scale (slides, video)
Lecturer: Irina Rish
Topic: Overview of Papers to Present and Some Projects Topics (video-part1, video-part2 )
Class materials: some of the previous Topics & Papers (focus on: Continual Learning at Scale, Alignment and Safety, Emergence, Phase Transitions and Stat Physics of ML), Some previous large-scale projects, Towards Time_Series Foundation Models
Topic: Scaling Laws for Neural Language Models (slides, video)
Also covered: Training Compute-Optimal Large Language Models (Chinchilla Explained: video), Emergent Abilities of Large Language Models, Are emergent abilities of LLMs a Mirage? Additional materials: Neural Scaling Laws and GPT-3 (video); a nice overview of the history of scaling laws: Scaling Laws for LLMs: from GPT-3 to o3
Topic: An Empirical Model of Large-Batch Training (slides, video)
Topic: DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning (slides, video - part 2, starting at 01:04:29)
Topic: LoRA: Low-Rank Adaptation of Large Language Models (slides, video)
Topic: Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts (slides, video)
Topic: Persona Vectors: Monitoring and Controlling Character Traits in Language Models (slides, video)
Topic: VinePPO: Refining Credit Assignment in RL Training of LLMs (slides, video)
Topic: Effect of scale on catastrophic forgetting in neural networks (slides, video)
Topic: VGGT: Visual Geometry Grounded Transformer (slides, video)
Topic: Scaling Laws for Transfer (slides, video)
Topic: On the Biology of a Large Language Model (slides, video)
Topic: The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery (slides, video)
Topic: Simple and Scalable Strategies to Continually Pre-train Large Language Models (slides, video)
Topic: The Platonic Representation Hypothesis (slides, video)
Topic: The Ultra-Scale Playbook: Training LLMs on GPU Clusters (slides, video)
Topic: Evaluating Large Language Models Trained on Code (slides, video)
Topic: Hierarchical Reasoning Model (slides, video)
Topic: Training Compute-Optimal Protein Language Models (slides, video)
Topic: K2-Think: A Parameter-Efficient Reasoning System (slides, video)
Topic: Muon is Scalable for LLM Training (slides, video - part 2, from 1:51:54)
Topic: Scaling Laws For Dense Retrieval (slides, video)
Topic: Alignment faking in large language models (slides, video)
Topic: Subliminal Learning: Language models transmit behavioral traits via hidden signals in data (slides, video)
Topic: On the Theoretical Limitations of Embedding-Based Retrieval (slides, video)
Topic: The Superposition of Diffusion Models Using the Itô Density Estimator (slides, video)
Topic: xLSTM: Extended Long Short-Term Memory (slides, video)
Topic: Why Language Models Hallucinate (slides, video)
Topic: Large Language Diffusion Models (slides, video)
Topics: The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search & AlphaEvolve: A coding agent for scientific and algorithmic discovery (slides, video)
Topic: Less is More: Recursive Reasoning with Tiny Networks (slides, video)
Topics: Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws (slides, video)
Topic: Surprising Effectiveness of pretraining Ternary Language Model at Scale (slides, video)
Topic: DUNE: Distilling a Universal Encoder from heterogenous 2D and 3D teachers (slides, video)
Topic: LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning (slides, video)
Topic: Distillation Scaling Laws (slides, video)
Topic: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis (slides, video)
Topic: Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis (slides, video)
Topic: Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach (slides, video)