Scaling Laws and Foundation Models

IFT 6760B & 6167 Winter 2022, Université de Montréal / Mila - Quebec AI Institute

Course Description Topics&Papers Schedule Invited Talks Reading Groups

Here is a suggested list of topics and papers - still UNDER CONSTRUCTION.

If you would like to suggest a relevant paper not in the list, please contact the instructor and/or the TAs (contact info on the course descriptions page). Here is paper presentation schedule & sign up sheet

Other Relevant Courses

2022 AI Safety Fundamentals course at Cambridge

https://github.com/jacobhilton/deep_learning_curriculum

Alternative points of view and criticism of large-scale models

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?

Colin Raffel: talks

A few possibly controversial opinions about large language models at Carnegie Mellon University Language Technologies Topical Seminar, 2021.

The Sweet Lesson at SustaiNLP Workshop, 2021.

What do language models learn from language modeling? at Stanford University CS 330 Lecture, 2021.

How and why should(n't) we scale machine learning? at IBM AI Hardware Forum Keynote, 2021.

A better way to get language models to do what you ask at AKBC 2021 Unstructured and Structured Knowledge Bases Workshop and Cohere.ai, 2021.

Scaling up Models and Data at CIFAR Deep Learning and Reinforcement Learning Summer School, Nepal Winter School in AI, and Advanced Language Processing Winter School, 2021.

More on "beyond scaling"

Articles & Blog Posts on Scaling and Large-Scale Models

Videos: Talks, Tutorials, Demos

Recent Large-Scale Pretrained Models (a.k.a. Foundation Models)

Foundation Models, Scaling and Reinforcement Learning

Online Decision Transformer

Can Wikipedia Help Offline Reinforcement Learning?

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

Training language models to follow instructions with human feedback

Offline Pre-trained Multi-Agent Decision Transformer

Fine-Tuning Language Models from Human Preferences

Learning to summarize from human feedback

Recursively Summarizing Books with Human Feedback

AI Safety & Alignment

AI, Philosophy & Ethics

Why Anything? Why This?

Reasons and Persons

On What Matters (vol 1, vol 2, vol 3)

Scaling Laws in Natural and Artificial Systems

Broken Power Laws:
astrophysics https://www.aanda.org/articles/aa/olm/2011/02/aa15581-10/aa15581-10.html
material science: https://tel.archives-ouvertes.fr/tel-01037944/document
socio-economics https://www.sciencedirect.com/science/article/abs/pii/S0378437119317935
hydrology https://www.researchgate.net/publication/261006967_The_use_of_broken_power-laws_to_describe_the_distributions_of_daily_flow_above_the_mean_annual_flow_across_the_conterminous_US
A Connectomic Hypothesis for the Hominization of the Brain
Critical integration in neural and cognitive systems: Beyond power-law scaling as the hallmark of soft assembly
Critical Truths about Power Laws
Scale invariance in natural and artificial collective systems: a review
Toward an Instance Theory of Automatization
“It’s like there’s this universal scaling law of cortical maturation.” https://www.spectrumnews.org/news/new-studies-reveal-how-autism-might-alter-synapse-formation-pruning/ New studies reveal how autism might alter synapse formation, pruning

More on criticality:

Beggs, J. M. (2008). The criticality hypothesis: how local cortical networks might optimize information processing. Philos. Trans. A Math. Phys. Eng. Sci.366, 329–343. doi: 10.1098/rsta.2007.2092
Shew, W. L., and Plenz, D. (2013). The functional benefits of criticality in the cortex. Neuroscientist19, 88–100. doi: 10.1177/1073858412445487
Shew, W. L., Yang, H., Petermann, T., Roy, R., and Plenz, D. (2009). Neuronal avalanches imply maximum dynamic range in cortical networks at criticality. J. Neurosci.29, 15595–15600. doi: 10.1523/JNEUROSCI.3864-09.2009
Shew, W. L., Yang, H., Yu, S., Roy, R., and Plenz, D. (2011). Information capacity and transmission are maximized in balanced cortical networks with neuronal avalanches. J. Neurosci.31, 55–63. doi: 10.1523/JNEUROSCI.4637-10.2011

MultiModal Transformers

https://jalammar.github.io/illustrated-transformer/A survey on visual transformer
https://jalammar.github.io/illustrated-transformer/
Transformers in vision: A survey
An attentive survey of attention models
Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applications
A ConvNet for the 2020s
Attention bottlenecks for multimodal fusion
Table Pretraining: A Survey on Model Architectures, Pretraining Objectives, and Downstream Tasks
Data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Transformers in Time Series: a Survey
Pretrained Transformers As Universal Computation Engines
Time-series Transformers: a Survey
- Multivariate Time Series Forecasting with Latent Graph Inference
- SCINet (https://arxiv.org/pdf/2106.09305.pdf)
- DEPTS (https://openreview.net/forum?id=AJAR-JgNw)
- S4 (https://srush.github.io/annotated-s4/)
- ETSformer (https://arxiv.org/abs/2202.01381)
- Pyraformer (https://openreview.net/pdf?id=0EXmFzUn5I)
- Informer (https://arxiv.org/abs/2012.07436)
- Reformer (https://arxiv.org/pdf/2001.04451.pdf)
- N-HiTS (https://arxiv.org/pdf/2201.12886.pdf)
- Autoformer (https://arxiv.org/pdf/2106.13008.pdf)
- LogTrans (https://arxiv.org/pdf/1907.00235.pdf)
- GLR local global ts representations (https://arxiv.org/pdf/2202.02262.pdf)
- TACTiS (https://arxiv.org/pdf/2202.03528.pdf)
- MQTransformer (https://arxiv.org/pdf/2009.14799.pdf)
- ProTran (https://proceedings.neurips.cc/paper/2021/file/c68bd9055776bf38d8fc43c0ed283Paper.pdf)
- Preformer (https://arxiv.org/pdf/2202.11356.pdf)
- Spacetimeformer (https://arxiv.org/pdf/2109.12218.pdf)

Neural Scaling Laws

Generalization (In- and Out-of-Distribution)

https://arxiv.org/abs/2109.03795

https://arxiv.org/abs/2007.01434

https://arxiv.org/abs/2102.11436

https://arxiv.org/abs/2107.12580

https://arxiv.org/abs/2108.12284?context=cs.AI

http://proceedings.mlr.press/v119/sastry20a.htmlhttp://arxiv.org/abs/2106.03721

https://arxiv.org/abs/2108.13624

Continual- and Meta-Learning

Scaling and Continual Learning

Embracing Change: Continual Learning in Deep Neural Networks
Towards Continual Reinforcement Learning: A Review and Perspectives
Book (1st book on the topic): Lifelong Machine Learning
Continual lifelong learning with neural networks: A review
Continual learning: A comparative study on how to defy forgetting in classification tasks
A wholistic view of continual learning with deep neural networks: Forgotten lessons and the bridge to active and open world learning
Class-incremental learning: survey and performance evaluation
Never-Ending Learning (tutorial by Tom Mitchell and Partha Talukdar, ICML 2019)
Continual Learning with Deep Architectures (tutorial by Irina Rish and Vincenzo Lomonaco, ICML 2021)
Continual Lifelong Learning in Natural Language Processing: A Survey
Drinking from a Firehose: Continual Learning with Web-scale Natural Language
Pretrained Language Model in Continual Learning: A Comparative Study

Compression and distillation of large models

Solvable Model for Inheriting the Regularization through Knowledge Distillation

Page updated

Google Sites

Report abuse