Neural Scaling Laws

Neural Scaling Laws

2024

Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
Resolving discrepancies in compute-optimal scaling of language models

Power scheduler: A batch size and token number agnostic learning rate scheduler

A Practitioner's Guide to Continual Multimodal Pretraining

A Learning Rate Path Switching Training Paradigm for Version Updates of Large Language Models

Better schedules for low precision training of deep neural networks

Learning with random learning rates

Learning to learn learning-rate schedules

Investigating Continual Pretraining in Large Language Models

Rethinking Conventional Wisdom in Machine Learning: From Generalization to Scaling

Scaling Laws in Linear Regression: Compute, Parameters, and Data

How Feature Learning Can Improve Neural Scaling Laws

Random matrix methods for high-dimensional machine learning models

How predictable is language model benchmark performance?

The Effect of Intrinsic Dataset Properties on Generalization: Unraveling Learning Differences Between Natural and Medical Images

2023

Extrapolating performance in language modeling benchmarks

DataComp: In search of the next generation of multimodal datasets

Emergent and predictable memorization in large language models

LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models

Uncovering Neural Scaling Laws in Molecular Representation Learning

Exploring the Representation Manifolds of Stable Diffusion Through the Lens of Intrinsic Dimension

Unmonitorability of Artificial Intelligence

Scaling Data-Constrained Language Models

Are emergent abilities of Large Language Models a mirage?

Neural scaling of deep chemical models

Data efficient neural scaling law via model reusing

Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws

Scaling Data-Constrained Language Models

An Information-Theoretic Analysis of Compute-Optimal Neural Scaling Laws

Scaling laws for single-agent reinforcement learning

Broken Neural Scaling Laws

Scaling Laws for Generative Mixed-Modal Language Models

Training Trajectories of Language Models Across Scales

2022

Holistic Evaluation of Language Models (HELM) - leaderboard

Reproducible scaling laws for contrastive language-image learning (LAION CLIP)

Scaling Laws Beyond Backpropagation

Beyond neural scaling laws: beating power law scaling via data pruning

Training compute-optimal large language models ("Chinchilla")

What Language Model to Train if You Have One Million GPU Hours?

Revisiting neural scaling laws in language and vision
A Solvable Model of Neural Scaling Laws
Transcending scaling laws with 0.1% extra compute
Unified Scaling Laws for Routed Language Models - Scaling laws for MOEs
Scaling laws and persistence in human brain activity
Scaling Scaling Laws with Board Games - Scaling laws for AlphaZero on Hex
Scaling Laws for Neural Language Models (Kaplan et al, the original famous scaling laws paper)
Scaling Laws for Autoregressive Generative Modeling
Scaling Laws for Transfer
Explaining Neural Scaling Laws
A Neural Scaling Law from the Dimension of the Data Manifold
Scaling vision transformers
Deep Learning Scaling is Predictable, Empirically
Learning Curve Theory
On Power Laws in Deep Ensembles
A constructive prediction of the generalization error across scales
Jonathan Rosenfeld's PhD thesis on Scaling Laws for Deep Learning
Learning Curves: Asymptotic Values and Rate of Convergence

Scaling and Reinforcement Learning

Miscelaneous

Synergy and symmetry in deep learning: Interactions between the data, model, and inference algorithm

Surveys:

EpochAI Scaling Laws Literature review and A database of papers on scaling laws

The efficiency spectrum of large language models: An algorithmic survey

History of Scaling Laws:
Learning Curves: Asymptotic Values and Rate of Convergence (Cortes et al, 1994)

BNSL:

Multiply broken power-law densities as survival functions

Page updated

Google Sites

Report abuse