Neural Scaling Laws and Foundation Models

IFT 6760B & 6167 Winter 2022, Université de Montréal / Mila - Quebec AI Institute

Course Description   Topics&Papers   Schedule     Invited Talks   Reading Group

Lecture 1 (10/01/2022) 

Lecturer: Irina Rish

Slides: Introduction to Neural Scaling Laws and Foundation Models

Video: here (chapter1 - open discussion, chapter2 - presentation)

Lecture 2 (13/01/2022)

Part 1:   Lecturer: Irina Rish 

Slides: The Bitter Lesson, Scaling and GPT-3

Video: here (parts 1 and 2)

Online video: GPT 3 Demo and Explanation - An AI revolution from OpenAI

Paper:  The Bitter LessonGPT-3 paper: Language Models are Few-Shot Learners  

_______________________________________________________________________________________________________________________

Part  2:  Lecturer: Irina Rish 

Slides: Introduction to Continual Learning

Video: here (part 3)

Homework: 

Lecture 3 (17/01/2022) 

Lecturer: Irina Rish

Slides: Introduction to Continual Learning  (continued from lecture 2)

Video: here 

Lecture 4 (21/01/2022

Part  1Lecturer: Irina Rish 

Slides: Continual Reinforcement  Learning  (15min)

Video: here (part 1)

 _________________________________________________________________________________________________________________

Part 2   Lecturer: Irina Rish 

Slides: Neural Scaling Laws workshop series - overview
   Phase Transitions in AI   (Phase Transitions in Machine Learning book )

Video: here (parts 2)

Lecture 5 (24/01/2022)   - Tue 6:00pm

Part 1   Lecturer: Irina Rish 

Slides: Multimodal Foundation Models   

Paper: OpenAI blog on CLIP, OpenAI blog on DALL-E

References:  laion.ai

Video: here  (part 1)

Brief (impromptu) discussion on alignment and AI safety (video: part 2)

References:    AI and the paperclip problemAI alignment research links

 ________________________________________________________________________________________________________________

Overview of Scaling Laws Papers   Resource: A comprehensive bibliography by Gwern

Papers: Deep Learning Scaling is Predictable, Empirically,   Scaling Laws for Neural Language Models,   Scaling Laws for Autoregressive Generative ModelingScaling Laws for Transfer

VIdeo: to be recorded/posted (ran out of time)

Lecture 5 (24/01/2022 

Part 1   Lecturer:  Tianyu Zhang & Yusong Wu 

Slides: here

Paper: On Power Laws in Deep Ensembles

Video: here (parts 1)

_________________________________________________________________________________________________________________

Part 2   Lecturer: Tianyu Zhang & Yusong Wu 

Slides: here

Paper: Zero-Shot Text-to-Image Generation 

Video: here (parts 2) 

Lecture 6 (31/01/2022 

Part 1   Lecturer: Jean-Charles Layoun & Tom Marty

Slides: here

Paper:  A continual learning survey: Defying forgetting in classification tasks 

Video: here (parts 1)

 _________________________________________________________________________________________________________________

Part 2   Lecturer: Mahta Ramezanian & Timothy Nest 

Slides: here

Paper: A Connectomic Hypothesis for the Hominization of the Brain

Video: here (parts 2, 3) 

Lecture 7 (03/02/2022 

Part 1   Lecturer:  Christoph Schuhmann

Slides: Overview of LAION (laion.ai)

Video: here (parts 1)

 _________________________________________________________________________________________________________________

Part 2   Lecturer:  Balaji Balasubramanian & Eshwanth Baskaran

Slides:  here

Paper: Learning Transferable Visual Models From Natural Language Supervision

Video: here (parts 2) 

Lecture 8 (08/02/2022  - NOTE: moved to Tue 6pm

Part 1   Lecturer: Irina Rish

Paper: Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot 

Video: here (part 1) _________________________________________________________________________________________________________________

Paper:  MAGMA – Multimodal Augmentation of Generative Models through Adapter-based Finetuning 

Video: here  (part 2)

_________________________________________________________________________________________________________________

Paper: Perceiver: General Perception with Iterative Attention    

Video:  Youtube Summary

Lecture 9 (10/02/2022 

Part 1   Lecturer:  Muawiz Chaudhary and Athul Sreemathy Raj

Slides: here

Paper: Parallel Training of Deep Networks with Local Updates

Video: here (parts 1)

 _________________________________________________________________________________________________________________

Part 2   Lecturer: Irina Rish

Slides:  Some Ideas for Class Projects

Papers:  also linked to from the slides

Neurogenesis-Inspired Dictionary Learning, Beyond Backprop: Online Alternating Minimization with Auxiliary Variables, Towards Scaling Difference Target Propagation by Learning Backprop Targets

Video: here (parts 2) 

Lecture 10 (14/02/2022 

Part 1   Lecturer:  Abhinav Moudgil and Venkatesh Ramesh

Slides: here

Paper: Masked Autoencoders are Scalable Vision Learners

Video: here (part 1)

 _________________________________________________________________________________________________________________

Part 2   Lecturer:  Kyle Roth and Artem Ploujnikov

Slides: here

Paper: Deep Learning Scaling is Predictable, Empirically

Video: here (part 2)

Lecture 11 (17/02/2022 

Part 1   Lecturer:  Pranshu malviya and Arjun Vaithilingam Sudhakar

Slides: here

Paper: Contrastive Syn-to-Real Generalization

Video: here (part 1)

 _________________________________________________________________________________________________________________

Part 2   Lecturer:  Balaji Balasubramanian and Eshwanth Baskaran

Slides: here

Paper:  Scaling Laws for Transfer

Video: here (part 2)



Lecture 12 (24/02/2022 

Part 1   Lecturer:  Andrei Romascanu and Nicole Fitzgerald

Slides: here

Paper: Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

Video: here (part 1)

 _________________________________________________________________________________________________________________

Part 2   Lecturer: Gopeshh Subbaraj and Johan Obando

Slides:  here

Paper: Can Wikipedia Help Offline Reinforcement Learning?

Video: here (part 2)


Guest talk: EleutherAI meets Mila  (02/03/2022 

Guest talk: Aleph Alpha meets Mila   (03/03/2022 

Lecture 13 (07/03/2022 

Part 1   Lecturer:  Naga Karthik and Jingwei Xie

Slides: here

Paper: CONTINUAL LEARNING OF LONGITUDINAL HEALTH RECORDS

Video: here (part 1)

 _________________________________________________________________________________________________________________

Part 2   Lecturer: Alexis Roger and Tom Marty

Slides: here

Paper: Concrete Problems in AI Safety

Video: here (part 2)

Lecture 14 (10/03/2022 

Part 1   Lecturer:  Abhinav Moudgil and Venkatesh Ramesh

Slides: here 

Paper: Attention Bottlenecks for Multimodal Fusion

Video: here (part 1)

 _________________________________________________________________________________________________________________

Part 2   Lecturer: Arnav Kumar Jain and Hasti Nafisi

Slides: here

Paper: A ConvNet for the 2020s

Video: here (part 2)


Lecture 15 (14/03/2022 

Part 1   Lecturer:  Leo Gagnon

Slides: here

Paper: Unified scaling laws for routed language models 

Video: here (part 1)

 _________________________________________________________________________________________________________________

Part 2   Lecturer:  Kyle Roth and Artem Ploujnikov 

Slides: here

Paper: The Effect of Model Size on Worst-Group Generalization 

Video: here (part 2)


Lecture 16 (17/03/2022 

Part 1   Lecturer:  Jean-Charles Layoun and Alexis Roger

Slides: here

Paper:  Unsolved Problems in ML Safety

Video: here (part 1)

 _________________________________________________________________________________________________________________

Part 2   Lecturer: Nicolas Bent and Siddhika Arunachalam

Slides: here

Paper: In Search of Lost Domain Generalization 

Video: here (part 2)


Lecture 17 (21/03/2022 

Part 1   Lecturer:  Reza Bayat and Diganta Misra  - RESCHEDULED

Slides: TBA

Paper: The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Video: TBA

 _________________________________________________________________________________________________________________

Part 2   Lecturer: Bhagya C and ilyas ahmed

Slides: here

Paper: Continual lifelong learning with neural networks: A review

Video: here


Lecture 18 (24/03/2022 

Part 1   Lecturer:  Gopeshh Subbaraj and Johan Obando

Slides: here

Paper: The Role of Pretrained Representations for the OOD Generalization of RL Agents 

Video: here (part 1)

 _________________________________________________________________________________________________________________

Part 2   Lecturer: Mahta Ramezanian and Chris Emezue

Slides: here

Paper: Brains and algorithms partially converge in natural language processing 

Video: here (part 2)


Lecture 19 (28/03/2022 

Part 1   Lecturer:  Nishka Katoch

Slides: here

Paper: data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language 

Video: here (part 1)

 _________________________________________________________________________________________________________________

Part 2   Lecturer: Andrei Romascanu and Rafael Hernandez Garcia

Slides: here

Paper: Training Larger Networks for Deep Reinforcement Learning - arXiv

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model 

Video: here (part 2)


Lecture 20 (31/03/2022 

Part 1   Lecturer:  Jingwei Xie and Naga Karthik

Slides: here

Paper: What is Wrong with Continual Learning in Medical Image Segmentation?

Video: here (part 1)

 _________________________________________________________________________________________________________________

Part 2   Lecturer: Leo Gagnon

Slides: here

Paper: Towards Out-Of-Distribution Generalization: A Survey

Video: here (part 2)


Lecture 21 (04/04/2022 

Part 1   Lecturer:  Muawiz Chaudhary and Athul Sreemathy Raj

Slides: here

Paper: Effect of Pre-Training Scale on Intra- and Inter-Domain Full and Few-Shot Transfer Learning for Natural and Medical X-Ray Chest Images

Video: here (part 1)

 _________________________________________________________________________________________________________________

Part 2   Lecturer: Pravish Sainath 

Slides: here

Paper: Pretrained Transformers as Universal Computation Engines

Video: here (part 2)


Lecture 22 (07/04/2022 

Lecturer: Diganta Misra and Reza Bayat

Slides: here

Paper: The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Video: here


Lecture 23 (11/04/2022 

Part 1   Lecturer:  Kshitij Gupta and Marc-Antoine Provost

Slides: here

Paper: Chain of Thought Prompting Elicits Reasoning in Large Language Models

Video: here (part 1)

 _________________________________________________________________________________________________________________

Part 2   Lecturer: Diganta Misra and Reza Bayat

Slides: here

Paper: Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask

Video: here (part 2)


Lecture 24 (14/04/2022 

Part 1  Lecturer: Arnav Jain and Hasti Nafisi                  STARTS AT 15h45 

Slides: here

Paper: Mastering Atari with Discrete World Models

Video: here (part 1)

________________________________________________________________________

Part 2   Lecturer:  Nicolas Bent and Siddhika Arunachalam 

Slides: here

Paper: Rethinking Bias-Variance Trade-off for Generalization of Neural Networks 

Video: here (part 2)

 _________________________________________________________________________________________________________________

Part 3   Lecturer:   Kshitij Gupta and  Ethan Caballero

Slides: here

Paper: Effect of scale on catastrophic forgetting in neural networks

Video: here (part 3)


Lecture 25 (18/04/2022 

Part 1   Lecturer:  Ethan Caballero

Slides: here

Paper: Scaling Laws for Neural Language Models

Video: here (part 1)

 _________________________________________________________________________________________________________________

Part 2   Lecturer:   Nizar Islah and Rafael Hernandez Garcia

Slides: here

Paper: Explaining Neural Scaling Laws

Video: here (part 2)


Reading Group (20/04/2022  3pm

Part 1   Lecturers:  Arjun Vaithilingam Sudhakar and Pranshu Malviya

Slides: here

Paper: Vision Models Are More Robust And Fair When Pretrained On Uncurated Images

Video: here (part 1)

Lecture 26 (21/04/2022   3:00pm -invited talk by Jared Kaplan

Part 1   Guest Talk: Jared Kaplan   

See  Invited Talks

 _________________________________________________________________________________________________________________

Part 2   Lecturer:  Nizar Islah and Tim Nest

Slides: here

Paper: Supermasks in Superposition

Video: TBA


Lecture 27 (25/04/2022 

Part 1  Lecturer:  Nicole Fitzgerald 

Slides: here

Paper: Deep Double Descent: Where Bigger Models and More Data Hurt 

Video: here (part 1)

 _________________________________________________________________________________________________________________

Part 2   Lecturer:  Bhagya C and ilyas ahmed

Slides: TBA

Paper: Generalizing to Unseen Domains: A Survey on Domain Generalization

Video: here (part 2)

 _________________________________________________________________________________________________________________

Part 3   Lecturer:   Marc-Antoine Provost

Slides: here

Paper: Player of Games

Video: here (part 3)


Final Project Presentations

Final Project Presentations (28/04/2022, 4:30pm, Agora)  

Presenter: Diganta Misra

Project title: APP: Anytime Progressive Pruning

Slides: here

Video: here

Final Project Presentations (02/05/2022, 4:30pm, Agora)    

Presenters: Tom Marty and Siddhika Arunachalam and Rafael Hernandez and Rodwell Nicolas Bent

Project title: Scaling Law Robustness

Slides: here

Video: here (part 1)

________________________________________________________

Presenters: Jean-Charles Layoun and Alexis Roger

Project title: Aligning MAGMA by finetuning and few-shot learning

Slides: here

Video: here (part 2)

Final Project Presentations (05/05/2022 

Presenter: Léo Gagnon

Project title: Architectural Approaches in CL from First Principles

Slides: here

Video: here (part 1)

_________________

Presenters: Kyle Roth and Nicole Fitzgerald and Ilyas Ahmed

Project title: Scaling (and other methods) for worst-group generalization

Slides: here

Video: here (part 2)

_________________

Presenters: Balaji Balasubramanian and Eshwanth Baskaran

Project title: Scaling Laws for Image Captioning

Slides: here

Video:  here (part 3)

Final Project Presentations (09/05/2022 

Presenters: Venkatesh Ramesh and Naga Karthik

Project title: On Layer Normalization for Vision Transformers

Slides: here

Video: here (part 1)

_________________________________________________________________________

Presenters: Nishka Katoch and Artem Ploujnikov

Project title: SpeechBrain scaling study

Slides: here

Video:  here (part 2)

_______________________________________________________________________________________________________

Presenter: Jingwei Xie

Project title: Using Pretrained Large-Scale Language Models for Universal General-Purpose Sentence Representation

Slides: here

Video:  here (part 3)

Final Project Presentations (11/05/2022 - 3pm - 4:00pm Agora

Presenters: Arjun Vaithilingam Sudhakar and Pranshu malviya 

Project title: Diversity in Self Supervised representation

Slides: here

Video: here (part 1)

_______________________________________________________________________________________________________

Presenters: Gopeshh Subbaraj and Johan Obando

Project title: Understanding Impact of Scaling in Deep RL 

Slides: here

Video: here (part 2)

Final Project Presentations (12/05/2022 - noon to 3pm, Agora

Presenters: Marc-Antoine Provost and Athul Sreemathy Raj

Project title: Federated Learning benchmark for domain generalization

Slides: here

Video: here (part 1)

_________________________________________________________________________

Presenters: Andrei Mircea and Bhagya C

Project title: Curriculum learning of arithmetic, compositionally

Slides: here

Video: here (part 2)

_________________________________________________________________________

Presenters: Ethan Caballero

Project title: Scaling Laws for the Few-Shot Adaptation of Pre-trained Image Classifiers

Slides: here

Video: here (part 3)

Final Project Presentations (13/05/2022 - 10am - 1:30pm Agora

Presenters: Muawiz Chaudhary

Project title: Effect of Pretraining on Medical Imaging

Slides: here

Video: here (part 1)

_________________________________________________________________________

Presenter: Mahta Ramezanian and Nizar Islah 

Project title: Hopfield scaling

Slides: here

Video: here (part 2)

_________________________________________________________________________

Presenters: Yusong Wu and Tianyu Zhang

Project title: Contrastive Language–Audio Pre-training

Slides: here

Video: here (part 3)

Final Project Presentations (16/05/2022 - 10am - 1:30pm Agora

Presenters:  Arnav Kumar Jain and Hasti Nafisi

 Project title: Scaling Laws for Reinforcement Learning

Slides: here

Video:

_________________________________________________________________________

Presenters: Kshitij Gupta

Project title: Scaling laws for transfer between language and RL

Slides: TBA

Video:  here (part 2)

_________________________________________________________________________

Presenters: Abhinav Moudgil and Timothy Nest

Project title: Scaling backprop alternatives

Slides: here

Video: here (part 3)

_________________________________________________________________________

Presenters: Reza Bayat

Project title: Scaling Laws for Adversarial Robustness

Slides: here

Video: here (part 4)

_________________________________________________________________________

Presenter: Pravish Sainath

Project title: TBA

Slides: TBA

Video: here (part 5)