Neural Scaling Laws and Foundation Models
IFT 6760B & 6167 Winter 2022, Université de Montréal / Mila - Quebec AI Institute
Course Description Topics&Papers Schedule Invited Talks Reading Group
Lecture 1 (10/01/2022)
Lecturer: Irina Rish
Slides: Introduction to Neural Scaling Laws and Foundation Models
Video: here (chapter1 - open discussion, chapter2 - presentation)
Lecture 2 (13/01/2022)
Part 1: Lecturer: Irina Rish
Slides: The Bitter Lesson, Scaling and GPT-3
Video: here (parts 1 and 2)
Online video: GPT 3 Demo and Explanation - An AI revolution from OpenAI
Paper: The Bitter Lesson, GPT-3 paper: Language Models are Few-Shot Learners
_______________________________________________________________________________________________________________________
Part 2: Lecturer: Irina Rish
Slides: Introduction to Continual Learning
Video: here (part 3)
Homework:
Listen to GPT-3 Language Models are Few-Shot Learners (Paper Explained)
Experiment with GPT-3 API: OpenAI GPT-3 Now Open to Public [FREE]
Lecture 3 (17/01/2022)
Lecturer: Irina Rish
Slides: Introduction to Continual Learning (continued from lecture 2)
Video: here
Lecture 4 (21/01/2022)
Part 1: Lecturer: Irina Rish
Slides: Continual Reinforcement Learning (15min)
Video: here (part 1)
_________________________________________________________________________________________________________________
Part 2 Lecturer: Irina Rish
Slides: Neural Scaling Laws workshop series - overview
Phase Transitions in AI (Phase Transitions in Machine Learning book )
Video: here (parts 2)
Lecture 5 (24/01/2022) - Tue 6:00pm
Part 1 Lecturer: Irina Rish
Slides: Multimodal Foundation Models
Paper: OpenAI blog on CLIP, OpenAI blog on DALL-E
References: laion.ai
Video: here (part 1)
Brief (impromptu) discussion on alignment and AI safety (video: part 2)
References: AI and the paperclip problem, AI alignment research links
________________________________________________________________________________________________________________
Overview of Scaling Laws Papers Resource: A comprehensive bibliography by Gwern
Papers: Deep Learning Scaling is Predictable, Empirically, Scaling Laws for Neural Language Models, Scaling Laws for Autoregressive Generative Modeling, Scaling Laws for Transfer
VIdeo: to be recorded/posted (ran out of time)
Lecture 5 (24/01/2022)
Part 1 Lecturer: Tianyu Zhang & Yusong Wu
Slides: here
Paper: On Power Laws in Deep Ensembles
Video: here (parts 1)
_________________________________________________________________________________________________________________
Part 2 Lecturer: Tianyu Zhang & Yusong Wu
Slides: here
Paper: Zero-Shot Text-to-Image Generation
Video: here (parts 2)
Lecture 6 (31/01/2022)
Part 1 Lecturer: Jean-Charles Layoun & Tom Marty
Slides: here
Paper: A continual learning survey: Defying forgetting in classification tasks
Video: here (parts 1)
_________________________________________________________________________________________________________________
Part 2 Lecturer: Mahta Ramezanian & Timothy Nest
Slides: here
Paper: A Connectomic Hypothesis for the Hominization of the Brain
Video: here (parts 2, 3)
Lecture 7 (03/02/2022)
Part 1 Lecturer: Christoph Schuhmann
Slides: Overview of LAION (laion.ai)
Video: here (parts 1)
_________________________________________________________________________________________________________________
Part 2 Lecturer: Balaji Balasubramanian & Eshwanth Baskaran
Slides: here
Paper: Learning Transferable Visual Models From Natural Language Supervision
Video: here (parts 2)
Lecture 8 (08/02/2022) - NOTE: moved to Tue 6pm
Part 1 Lecturer: Irina Rish
Paper: Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot
Video: here (part 1) _________________________________________________________________________________________________________________
Paper: MAGMA – Multimodal Augmentation of Generative Models through Adapter-based Finetuning
Video: here (part 2)
_________________________________________________________________________________________________________________
Paper: Perceiver: General Perception with Iterative Attention
Video: Youtube Summary
Lecture 9 (10/02/2022)
Part 1 Lecturer: Muawiz Chaudhary and Athul Sreemathy Raj
Slides: here
Paper: Parallel Training of Deep Networks with Local Updates
Video: here (parts 1)
_________________________________________________________________________________________________________________
Part 2 Lecturer: Irina Rish
Slides: Some Ideas for Class Projects
Papers: also linked to from the slides
Neurogenesis-Inspired Dictionary Learning, Beyond Backprop: Online Alternating Minimization with Auxiliary Variables, Towards Scaling Difference Target Propagation by Learning Backprop Targets
Video: here (parts 2)
Lecture 10 (14/02/2022)
Part 1 Lecturer: Abhinav Moudgil and Venkatesh Ramesh
Slides: here
Paper: Masked Autoencoders are Scalable Vision Learners
Video: here (part 1)
_________________________________________________________________________________________________________________
Part 2 Lecturer: Kyle Roth and Artem Ploujnikov
Slides: here
Paper: Deep Learning Scaling is Predictable, Empirically
Video: here (part 2)
Lecture 11 (17/02/2022)
Part 1 Lecturer: Pranshu malviya and Arjun Vaithilingam Sudhakar
Slides: here
Paper: Contrastive Syn-to-Real Generalization
Video: here (part 1)
_________________________________________________________________________________________________________________
Part 2 Lecturer: Balaji Balasubramanian and Eshwanth Baskaran
Slides: here
Paper: Scaling Laws for Transfer
Video: here (part 2)
Lecture 12 (24/02/2022)
Part 1 Lecturer: Andrei Romascanu and Nicole Fitzgerald
Slides: here
Paper: Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
Video: here (part 1)
_________________________________________________________________________________________________________________
Part 2 Lecturer: Gopeshh Subbaraj and Johan Obando
Slides: here
Paper: Can Wikipedia Help Offline Reinforcement Learning?
Video: here (part 2)
Guest talk: EleutherAI meets Mila (02/03/2022)
See Invited Talks
Guest talk: Aleph Alpha meets Mila (03/03/2022)
See Invited Talks
Lecture 13 (07/03/2022)
Part 1 Lecturer: Naga Karthik and Jingwei Xie
Slides: here
Paper: CONTINUAL LEARNING OF LONGITUDINAL HEALTH RECORDS
Video: here (part 1)
_________________________________________________________________________________________________________________
Part 2 Lecturer: Alexis Roger and Tom Marty
Slides: here
Paper: Concrete Problems in AI Safety
Video: here (part 2)
Lecture 14 (10/03/2022)
Part 1 Lecturer: Abhinav Moudgil and Venkatesh Ramesh
Slides: here
Paper: Attention Bottlenecks for Multimodal Fusion
Video: here (part 1)
_________________________________________________________________________________________________________________
Part 2 Lecturer: Arnav Kumar Jain and Hasti Nafisi
Slides: here
Paper: A ConvNet for the 2020s
Video: here (part 2)
Lecture 15 (14/03/2022)
Part 1 Lecturer: Leo Gagnon
Slides: here
Paper: Unified scaling laws for routed language models
Video: here (part 1)
_________________________________________________________________________________________________________________
Part 2 Lecturer: Kyle Roth and Artem Ploujnikov
Slides: here
Paper: The Effect of Model Size on Worst-Group Generalization
Video: here (part 2)
Lecture 16 (17/03/2022)
Part 1 Lecturer: Jean-Charles Layoun and Alexis Roger
Slides: here
Paper: Unsolved Problems in ML Safety
Video: here (part 1)
_________________________________________________________________________________________________________________
Part 2 Lecturer: Nicolas Bent and Siddhika Arunachalam
Slides: here
Paper: In Search of Lost Domain Generalization
Video: here (part 2)
Lecture 17 (21/03/2022)
Part 1 Lecturer: Reza Bayat and Diganta Misra - RESCHEDULED
Slides: TBA
Paper: The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Video: TBA
_________________________________________________________________________________________________________________
Part 2 Lecturer: Bhagya C and ilyas ahmed
Slides: here
Paper: Continual lifelong learning with neural networks: A review
Video: here
Lecture 18 (24/03/2022)
Part 1 Lecturer: Gopeshh Subbaraj and Johan Obando
Slides: here
Paper: The Role of Pretrained Representations for the OOD Generalization of RL Agents
Video: here (part 1)
_________________________________________________________________________________________________________________
Part 2 Lecturer: Mahta Ramezanian and Chris Emezue
Slides: here
Paper: Brains and algorithms partially converge in natural language processing
Video: here (part 2)
Lecture 19 (28/03/2022)
Part 1 Lecturer: Nishka Katoch
Slides: here
Paper: data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Video: here (part 1)
_________________________________________________________________________________________________________________
Part 2 Lecturer: Andrei Romascanu and Rafael Hernandez Garcia
Slides: here
Paper: Training Larger Networks for Deep Reinforcement Learning - arXiv
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
Video: here (part 2)
Lecture 20 (31/03/2022)
Part 1 Lecturer: Jingwei Xie and Naga Karthik
Slides: here
Paper: What is Wrong with Continual Learning in Medical Image Segmentation?
Video: here (part 1)
_________________________________________________________________________________________________________________
Part 2 Lecturer: Leo Gagnon
Slides: here
Paper: Towards Out-Of-Distribution Generalization: A Survey
Video: here (part 2)
Lecture 21 (04/04/2022)
Part 1 Lecturer: Muawiz Chaudhary and Athul Sreemathy Raj
Slides: here
Video: here (part 1)
_________________________________________________________________________________________________________________
Part 2 Lecturer: Pravish Sainath
Slides: here
Paper: Pretrained Transformers as Universal Computation Engines
Video: here (part 2)
Lecture 22 (07/04/2022)
Lecturer: Diganta Misra and Reza Bayat
Slides: here
Paper: The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Video: here
Lecture 23 (11/04/2022)
Part 1 Lecturer: Kshitij Gupta and Marc-Antoine Provost
Slides: here
Paper: Chain of Thought Prompting Elicits Reasoning in Large Language Models
Video: here (part 1)
_________________________________________________________________________________________________________________
Part 2 Lecturer: Diganta Misra and Reza Bayat
Slides: here
Paper: Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask
Video: here (part 2)
Lecture 24 (14/04/2022)
Part 1 Lecturer: Arnav Jain and Hasti Nafisi STARTS AT 15h45
Slides: here
Paper: Mastering Atari with Discrete World Models
Video: here (part 1)
________________________________________________________________________
Part 2 Lecturer: Nicolas Bent and Siddhika Arunachalam
Slides: here
Paper: Rethinking Bias-Variance Trade-off for Generalization of Neural Networks
Video: here (part 2)
_________________________________________________________________________________________________________________
Part 3 Lecturer: Kshitij Gupta and Ethan Caballero
Slides: here
Paper: Effect of scale on catastrophic forgetting in neural networks
Video: here (part 3)
Lecture 25 (18/04/2022)
Part 1 Lecturer: Ethan Caballero
Slides: here
Paper: Scaling Laws for Neural Language Models
Video: here (part 1)
_________________________________________________________________________________________________________________
Part 2 Lecturer: Nizar Islah and Rafael Hernandez Garcia
Slides: here
Paper: Explaining Neural Scaling Laws
Video: here (part 2)
Reading Group (20/04/2022) 3pm
Part 1 Lecturers: Arjun Vaithilingam Sudhakar and Pranshu Malviya
Slides: here
Paper: Vision Models Are More Robust And Fair When Pretrained On Uncurated Images
Video: here (part 1)
Lecture 26 (21/04/2022) 3:00pm -invited talk by Jared Kaplan
Part 1 Guest Talk: Jared Kaplan
See Invited Talks
_________________________________________________________________________________________________________________
Part 2 Lecturer: Nizar Islah and Tim Nest
Slides: here
Paper: Supermasks in Superposition
Video: TBA
Lecture 27 (25/04/2022)
Part 1 Lecturer: Nicole Fitzgerald
Slides: here
Paper: Deep Double Descent: Where Bigger Models and More Data Hurt
Video: here (part 1)
_________________________________________________________________________________________________________________
Part 2 Lecturer: Bhagya C and ilyas ahmed
Slides: TBA
Paper: Generalizing to Unseen Domains: A Survey on Domain Generalization
Video: here (part 2)
_________________________________________________________________________________________________________________
Part 3 Lecturer: Marc-Antoine Provost
Slides: here
Paper: Player of Games
Video: here (part 3)
Final Project Presentations
Final Project Presentations (02/05/2022, 4:30pm, Agora)
Presenters: Tom Marty and Siddhika Arunachalam and Rafael Hernandez and Rodwell Nicolas Bent
Project title: Scaling Law Robustness
Slides: here
Video: here (part 1)
________________________________________________________
Presenters: Jean-Charles Layoun and Alexis Roger
Project title: Aligning MAGMA by finetuning and few-shot learning
Slides: here
Video: here (part 2)
Final Project Presentations (05/05/2022)
Presenter: Léo Gagnon
Project title: Architectural Approaches in CL from First Principles
Slides: here
Video: here (part 1)
_________________
Presenters: Kyle Roth and Nicole Fitzgerald and Ilyas Ahmed
Project title: Scaling (and other methods) for worst-group generalization
Slides: here
Video: here (part 2)
_________________
Presenters: Balaji Balasubramanian and Eshwanth Baskaran
Project title: Scaling Laws for Image Captioning
Slides: here
Video: here (part 3)
Final Project Presentations (09/05/2022)
Presenters: Venkatesh Ramesh and Naga Karthik
Project title: On Layer Normalization for Vision Transformers
Slides: here
Video: here (part 1)
_________________________________________________________________________
Presenters: Nishka Katoch and Artem Ploujnikov
Project title: SpeechBrain scaling study
Slides: here
Video: here (part 2)
_______________________________________________________________________________________________________
Presenter: Jingwei Xie
Project title: Using Pretrained Large-Scale Language Models for Universal General-Purpose Sentence Representation
Slides: here
Video: here (part 3)
Final Project Presentations (11/05/2022) - 3pm - 4:00pm Agora
Presenters: Arjun Vaithilingam Sudhakar and Pranshu malviya
Project title: Diversity in Self Supervised representation
Slides: here
Video: here (part 1)
_______________________________________________________________________________________________________
Presenters: Gopeshh Subbaraj and Johan Obando
Project title: Understanding Impact of Scaling in Deep RL
Slides: here
Video: here (part 2)
Final Project Presentations (12/05/2022) - noon to 3pm, Agora
Presenters: Marc-Antoine Provost and Athul Sreemathy Raj
Project title: Federated Learning benchmark for domain generalization
Slides: here
Video: here (part 1)
_________________________________________________________________________
Presenters: Andrei Mircea and Bhagya C
Project title: Curriculum learning of arithmetic, compositionally
Slides: here
Video: here (part 2)
_________________________________________________________________________
Presenters: Ethan Caballero
Project title: Scaling Laws for the Few-Shot Adaptation of Pre-trained Image Classifiers
Slides: here
Video: here (part 3)
Final Project Presentations (13/05/2022) - 10am - 1:30pm Agora
Presenters: Muawiz Chaudhary
Project title: Effect of Pretraining on Medical Imaging
Slides: here
Video: here (part 1)
_________________________________________________________________________
Presenter: Mahta Ramezanian and Nizar Islah
Project title: Hopfield scaling
Slides: here
Video: here (part 2)
_________________________________________________________________________
Presenters: Yusong Wu and Tianyu Zhang
Project title: Contrastive Language–Audio Pre-training
Slides: here
Video: here (part 3)
Final Project Presentations (16/05/2022) - 10am - 1:30pm Agora
Presenters: Arnav Kumar Jain and Hasti Nafisi
Project title: Scaling Laws for Reinforcement Learning
Slides: here
Video:
_________________________________________________________________________
Presenters: Kshitij Gupta
Project title: Scaling laws for transfer between language and RL
Slides: TBA
Video: here (part 2)
_________________________________________________________________________
Presenters: Abhinav Moudgil and Timothy Nest
Project title: Scaling backprop alternatives
Slides: here
Video: here (part 3)
_________________________________________________________________________
Presenters: Reza Bayat
Project title: Scaling Laws for Adversarial Robustness
Slides: here
Video: here (part 4)
_________________________________________________________________________
Presenter: Pravish Sainath
Project title: TBA
Slides: TBA
Video: here (part 5)