Scaling Laws & Emergent Behaviors
Call-in link Mon @ 2:00 M EST (Mila Calendar) Topics&Papers
Relevant papers on Emergence, Phase Transitions and Stat Physics of ML
To be presented this semester
Chapter 2 overview: Phase Transitions In Machine Learning (book)
Jacob Steinhardt's blog: Future ML Systems Will Be Qualitatively Different - LessWrong
Exploring Neuron Interactions and Emergence in LLMs: From the Multifractal Analysis Perspective
The role of long-term power-law memory in controlling large-scale dynamical networks
2pm-3pm, Mon, June 24, 2024
Speakers: Paul Bogdan's group
A perspective on flocking in DNNs
2pm-3pm, Mon, June 10, 2024
Speakers: Andrei Mircea and Ekaterina Lobacheva
Gradient dissent in language model pretraining slides / video
2pm-3pm, Mon, June 3, 2024
An overview of ongoing work by several groups
Speakers: Paul Bogdan, Pascal Notsawo, Darshil Doshi
2pm-3pm, Mon, Apr 22, 2024
Speaker: Parviz
Talk: Grokking as Compression
Paper: Grokking as Compression
youtube: Grokking as Compression: A Nonlinear Complexity Perspective
2pm - 3pm, Mon March 11, 2024
Discussion, continued:
2pm-3pm, Mon March 4, 2024
Paper to discuss:
Related work/papers mentioned in today's discussion:
A Mathematical Framework for Transformer Circuits
Chinchilla scaling laws etc: Go smol or go home, Training Compute-Optimal Large Language Models, chinchilla's wild implications — LessWrong, Chinchilla Explained: video
Fall 2023
3pm-4pm, Mon, Nov 6, 2023
Speaker: Pascal Jr. Tikeng Notsawo (University of Montreal/Mila)
Talk: Is grokking predictable? video
Paper: Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok
3pm-4pm, Mon, Oct 23, 2023
Open discussion. Working document
3pm-4pm, Mon, Oct 9, 2023
Speaker: Andrey Gromov (University of Maryland)
Talk: A solvable model for grokking modular arithmetic
Paper: Grokking modular arithmetic
3pm-4pm, Mon, Oct 2, 2023
Open discussion. Working document
Summer 2023
Winter 2023
Class 18 (11am-noon, Tue, March 14, 2023)
Lecturers: Niki Howe
Topic: Adversarial Policies Beat Superhuman Go AIs (slides, video)
Class 15 (11am-noon, Tue, March 7, 2023)
Lecturers: Kyle Roth, Alex Fulleringer
Topic: Artificial Intelligence, Values, and Alignment (slides)
Date/time: Tue Feb 2 11:00 am EST
Topic: open discussion on chatbot behaviors, history of emergence and transitions in AI systems, Jacob Steinhardt's blog posts: Future ML Systems Will Be Qualitatively Different, Emergent Deception and Emergent Optimization , as well as Broken Neural Scaling Laws and Emergent Abilities of Large Language Models.
Speaker: Irina Rish (video)
Topic: Tutorial and Q&A on Phase Transitions (video Feb 2023) Date/time: Tue Feb 7, 11:00 am EST
extended version of the talk given at the 2nd Workshop on Neural Scaling Laws
Speaker: Guillaume Dumas
Papers mentioned: Problems in Physics wi th Many Scales of Length, Quantifying causal emergence shows that macro can beat micro, How critical is brain criticality?, Multilevel development of cognitive abilities in an artificial neural network, Why Deep Learning Works II: the Renormalization Group
Related: Scaling course, Class 5
Topic: Phase Transitions in AI (and Emergent Behaviors in Large-Scale models) (slides, video)
Papers on "phase transitions" in AI: Hard and Easy Distributions of SAT Problems, Every Monotone Graph Property Has a Sharp Threshold, Approximability of probability distributions, Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets, GPT-3 paper: Language Models are Few-Shot Learners, A universal law of robustness via isoperimetry.
Topic: Planning paper discussions for winter trimester Date/time: Tue Jan 24, 11:00 am EST
Speaker: Irina Rish