Scaling Laws & Emergent Behaviors
Call-in link Mon @ 2:00 M EST (Mila Calendar) Topics&Papers
Call-in link Mon @ 2:00 M EST (Mila Calendar) Topics&Papers
Relevant papers on Emergence, Phase Transitions and Stat Physics of ML
Chapter 2 overview: Phase Transitions In Machine Learning (book)
Jacob Steinhardt's blog: Future ML Systems Will Be Qualitatively Different - LessWrong
Exploring Neuron Interactions and Emergence in LLMs: From the Multifractal Analysis Perspective
The role of long-term power-law memory in controlling large-scale dynamical networks
Speakers: Paul Bogdan's group
A perspective on flocking in DNNs
Speakers: Andrei Mircea and Ekaterina Lobacheva
Gradient dissent in language model pretraining slides / video
An overview of ongoing work by several groups
Speakers: Paul Bogdan, Pascal Notsawo, Darshil Doshi
Speaker: Parviz
Talk: Grokking as Compression
Paper: Grokking as Compression
youtube: Grokking as Compression: A Nonlinear Complexity Perspective
Discussion, continued:
Paper to discuss:
Related work/papers mentioned in today's discussion:
A Mathematical Framework for Transformer Circuits
Chinchilla scaling laws etc: Go smol or go home, Training Compute-Optimal Large Language Models, chinchilla's wild implications — LessWrong, Chinchilla Explained: video
Speaker: Pascal Jr. Tikeng Notsawo (University of Montreal/Mila)
Talk: Is grokking predictable? video
Paper: Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok
Open discussion. Working document
Speaker: Andrey Gromov (University of Maryland)
Talk: A solvable model for grokking modular arithmetic
Paper: Grokking modular arithmetic
Open discussion. Working document
Lecturers: Niki Howe
Topic: Adversarial Policies Beat Superhuman Go AIs (slides, video)
Lecturers: Kyle Roth, Alex Fulleringer
Topic: Artificial Intelligence, Values, and Alignment (slides)
Date/time: Tue Feb 2 11:00 am EST
Topic: open discussion on chatbot behaviors, history of emergence and transitions in AI systems, Jacob Steinhardt's blog posts: Future ML Systems Will Be Qualitatively Different, Emergent Deception and Emergent Optimization , as well as Broken Neural Scaling Laws and Emergent Abilities of Large Language Models.
Speaker: Irina Rish (video)
Topic: Tutorial and Q&A on Phase Transitions (video Feb 2023) Date/time: Tue Feb 7, 11:00 am EST
extended version of the talk given at the 2nd Workshop on Neural Scaling Laws
Speaker: Guillaume Dumas
Papers mentioned: Problems in Physics wi th Many Scales of Length, Quantifying causal emergence shows that macro can beat micro, How critical is brain criticality?, Multilevel development of cognitive abilities in an artificial neural network, Why Deep Learning Works II: the Renormalization Group
Related: Scaling course, Class 5
Topic: Phase Transitions in AI (and Emergent Behaviors in Large-Scale models) (slides, video)
Papers on "phase transitions" in AI: Hard and Easy Distributions of SAT Problems, Every Monotone Graph Property Has a Sharp Threshold, Approximability of probability distributions, Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets, GPT-3 paper: Language Models are Few-Shot Learners, A universal law of robustness via isoperimetry.
Topic: Planning paper discussions for winter trimester Date/time: Tue Jan 24, 11:00 am EST
Speaker: Irina Rish