Neural Scaling Laws and Foundation Models

IFT 6760B & 6167 Winter 2022, Université de Montréal / Mila - Quebec AI Institute

Course Description Topics&Papers Schedule Invited Talks Reading Group

Paper presentations and final projects: schedule & sign up sheet

Lecture 1 (10/01/2022)

Lecturer: Irina Rish

Slides: Introduction to Neural Scaling Laws and Foundation Models

Video: here (chapter1 - open discussion, chapter2 - presentation)

Lecture 2 (13/01/2022)

Part 1: Lecturer: Irina Rish

Slides: The Bitter Lesson, Scaling and GPT-3

Video: here (parts 1 and 2)

Online video: GPT 3 Demo and Explanation - An AI revolution from OpenAI

Paper: The Bitter Lesson, GPT-3 paper: Language Models are Few-Shot Learners

_______________________________________________________________________________________________________________________

Part 2: Lecturer: Irina Rish

Slides: Introduction to Continual Learning

Video: here (part 3)

Homework:

Listen to GPT-3 Language Models are Few-Shot Learners (Paper Explained)
Experiment with GPT-3 API: OpenAI GPT-3 Now Open to Public [FREE]

Lecture 3 (17/01/2022)

Lecturer: Irina Rish

Slides: Introduction to Continual Learning (continued from lecture 2)

Video: here

Lecture 4 (21/01/2022)

Part 1: Lecturer: Irina Rish

Slides: Continual Reinforcement Learning (15min)

Video: here (part 1)

_________________________________________________________________________________________________________________

Part 2 Lecturer: Irina Rish

Slides: Neural Scaling Laws workshop series - overview
Phase Transitions in AI (Phase Transitions in Machine Learning book )

Video: here (parts 2)

Lecture 5 (24/01/2022) - Tue 6:00pm

Part 1 Lecturer: Irina Rish

Slides: Multimodal Foundation Models

Paper: OpenAI blog on CLIP, OpenAI blog on DALL-E

References: laion.ai

Video: here (part 1)

Brief (impromptu) discussion on alignment and AI safety (video: part 2)

References: AI and the paperclip problem, AI alignment research links

________________________________________________________________________________________________________________

Overview of Scaling Laws Papers Resource: A comprehensive bibliography by Gwern

Papers: Deep Learning Scaling is Predictable, Empirically, Scaling Laws for Neural Language Models, Scaling Laws for Autoregressive Generative Modeling, Scaling Laws for Transfer

VIdeo: to be recorded/posted (ran out of time)

Lecture 5 (24/01/2022)

Part 1 Lecturer: Tianyu Zhang & Yusong Wu

Slides: here

Paper: On Power Laws in Deep Ensembles

Video: here (parts 1)

_________________________________________________________________________________________________________________

Part 2 Lecturer: Tianyu Zhang & Yusong Wu

Slides: here

Paper: Zero-Shot Text-to-Image Generation

Video: here (parts 2)

Lecture 6 (31/01/2022)

Part 1 Lecturer: Jean-Charles Layoun & Tom Marty

Slides: here

Paper: A continual learning survey: Defying forgetting in classification tasks

Video: here (parts 1)

_________________________________________________________________________________________________________________

Part 2 Lecturer: Mahta Ramezanian & Timothy Nest

Slides: here

Paper: A Connectomic Hypothesis for the Hominization of the Brain

Video: here (parts 2, 3)

Lecture 7 (03/02/2022)

Part 1 Lecturer: Christoph Schuhmann

Slides: Overview of LAION (laion.ai)

Video: here (parts 1)

_________________________________________________________________________________________________________________

Part 2 Lecturer: Balaji Balasubramanian & Eshwanth Baskaran

Slides: here

Paper: Learning Transferable Visual Models From Natural Language Supervision

Video: here (parts 2)

Lecture 8 (08/02/2022) - NOTE: moved to Tue 6pm

Part 1 Lecturer: Irina Rish

Paper: Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot

Video: here (part 1) _________________________________________________________________________________________________________________

Paper: MAGMA – Multimodal Augmentation of Generative Models through Adapter-based Finetuning

Video: here (part 2)

_________________________________________________________________________________________________________________

Paper: Perceiver: General Perception with Iterative Attention

Video: Youtube Summary

Lecture 9 (10/02/2022)

Part 1 Lecturer: Muawiz Chaudhary and Athul Sreemathy Raj

Slides: here

Paper: Parallel Training of Deep Networks with Local Updates

Video: here (parts 1)

_________________________________________________________________________________________________________________

Part 2 Lecturer: Irina Rish

Slides: Some Ideas for Class Projects

Papers: also linked to from the slides

Neurogenesis-Inspired Dictionary Learning, Beyond Backprop: Online Alternating Minimization with Auxiliary Variables, Towards Scaling Difference Target Propagation by Learning Backprop Targets

Video: here (parts 2)

Lecture 10 (14/02/2022)

Part 1 Lecturer: Abhinav Moudgil and Venkatesh Ramesh

Slides: here

Paper: Masked Autoencoders are Scalable Vision Learners

Video: here (part 1)

_________________________________________________________________________________________________________________

Part 2 Lecturer: Kyle Roth and Artem Ploujnikov

Slides: here

Paper: Deep Learning Scaling is Predictable, Empirically

Video: here (part 2)

Lecture 11 (17/02/2022)

Part 1 Lecturer: Pranshu malviya and Arjun Vaithilingam Sudhakar

Slides: here

Paper: Contrastive Syn-to-Real Generalization

Video: here (part 1)

_________________________________________________________________________________________________________________

Part 2 Lecturer: Balaji Balasubramanian and Eshwanth Baskaran

Slides: here

Paper: Scaling Laws for Transfer

Video: here (part 2)

Lecture 12 (24/02/2022)

Part 1 Lecturer: Andrei Romascanu and Nicole Fitzgerald

Slides: here

Paper: Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

Video: here (part 1)

_________________________________________________________________________________________________________________

Part 2 Lecturer: Gopeshh Subbaraj and Johan Obando

Slides: here

Paper: Can Wikipedia Help Offline Reinforcement Learning?

Video: here (part 2)

Guest talk: EleutherAI meets Mila (02/03/2022)

See Invited Talks

Paper: Training Larger Networks for Deep Reinforcement Learning - arXiv

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Video: here (part 2)

Lecture 20 (31/03/2022)

Part 1 Lecturer: Jingwei Xie and Naga Karthik

Slides: here

Paper: What is Wrong with Continual Learning in Medical Image Segmentation?

Video: here (part 1)

_________________________________________________________________________________________________________________

Part 2 Lecturer: Leo Gagnon

Slides: here

Paper: Towards Out-Of-Distribution Generalization: A Survey

Video: here (part 2)

Lecture 21 (04/04/2022)

Part 1 Lecturer: Muawiz Chaudhary and Athul Sreemathy Raj

Slides: here

Paper: Effect of Pre-Training Scale on Intra- and Inter-Domain Full and Few-Shot Transfer Learning for Natural and Medical X-Ray Chest Images

Video: here (part 1)

_________________________________________________________________________________________________________________

Part 2 Lecturer: Pravish Sainath

Slides: here

Paper: Pretrained Transformers as Universal Computation Engines

Video: here (part 2)

Lecture 22 (07/04/2022)

Lecturer: Diganta Misra and Reza Bayat

Slides: here

Paper: The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Video: here

Lecture 23 (11/04/2022)

Part 1 Lecturer: Kshitij Gupta and Marc-Antoine Provost

Slides: here

Paper: Chain of Thought Prompting Elicits Reasoning in Large Language Models

Video: here (part 1)

_________________________________________________________________________________________________________________

Part 2 Lecturer: Diganta Misra and Reza Bayat

Slides: here

Paper: Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask

Video: here (part 2)

Lecture 24 (14/04/2022)

Part 1 Lecturer: Arnav Jain and Hasti Nafisi STARTS AT 15h45

Slides: here

Paper: Mastering Atari with Discrete World Models

Video: here (part 1)

________________________________________________________________________

Part 2 Lecturer: Nicolas Bent and Siddhika Arunachalam

Slides: here

Paper: Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

Video: here (part 2)

_________________________________________________________________________________________________________________

Part 3 Lecturer: Kshitij Gupta and Ethan Caballero

Slides: here

Paper: Effect of scale on catastrophic forgetting in neural networks

Video: here (part 3)

Lecture 25 (18/04/2022)

Part 1 Lecturer: Ethan Caballero

Slides: here

Paper: Scaling Laws for Neural Language Models

Video: here (part 1)

_________________________________________________________________________________________________________________

Part 2 Lecturer: Nizar Islah and Rafael Hernandez Garcia

Slides: here

Paper: Explaining Neural Scaling Laws

Video: here (part 2)

Reading Group (20/04/2022) 3pm

Part 1 Lecturers: Arjun Vaithilingam Sudhakar and Pranshu Malviya

Slides: here

Paper: Vision Models Are More Robust And Fair When Pretrained On Uncurated Images

Video: here (part 1)

Lecture 26 (21/04/2022) 3:00pm -invited talk by Jared Kaplan

Part 1 Guest Talk: Jared Kaplan

See Invited Talks

_________________________________________________________________________________________________________________

Part 2 Lecturer: Nizar Islah and Tim Nest

Slides: here

Paper: Supermasks in Superposition

Video: TBA

Lecture 27 (25/04/2022)

Part 1 Lecturer: Nicole Fitzgerald

Slides: here

Paper: Deep Double Descent: Where Bigger Models and More Data Hurt

Video: here (part 1)

_________________________________________________________________________________________________________________

Part 2 Lecturer: Bhagya C and ilyas ahmed

Slides: TBA

Paper: Generalizing to Unseen Domains: A Survey on Domain Generalization

Video: here (part 2)

_________________________________________________________________________________________________________________

Part 3 Lecturer: Marc-Antoine Provost

Slides: here

Paper: Player of Games

Video: here (part 3)

Final Project Presentations

Final Project Presentations (28/04/2022, 4:30pm, Agora)

Presenter: Diganta Misra

Project title: APP: Anytime Progressive Pruning

Slides: here

Video: here

Final Project Presentations (02/05/2022, 4:30pm, Agora)

Presenters: Tom Marty and Siddhika Arunachalam and Rafael Hernandez and Rodwell Nicolas Bent

Project title: Scaling Law Robustness

Slides: here

Video: here (part 1)

________________________________________________________

Presenters: Jean-Charles Layoun and Alexis Roger

Project title: Aligning MAGMA by finetuning and few-shot learning

Slides: here

Video: here (part 2)

Final Project Presentations (05/05/2022)

Presenter: Léo Gagnon

Project title: Architectural Approaches in CL from First Principles

Slides: here

Video: here (part 1)

_________________

Presenters: Kyle Roth and Nicole Fitzgerald and Ilyas Ahmed

Project title: Scaling (and other methods) for worst-group generalization

Slides: here

Video: here (part 2)

_________________

Presenters: Balaji Balasubramanian and Eshwanth Baskaran

Project title: Scaling Laws for Image Captioning

Slides: here

Video: here (part 3)

Final Project Presentations (09/05/2022)

Presenters: Venkatesh Ramesh and Naga Karthik

Project title: On Layer Normalization for Vision Transformers

Slides: here

Video: here (part 1)

_________________________________________________________________________

Presenters: Nishka Katoch and Artem Ploujnikov

Project title: SpeechBrain scaling study

Slides: here

Video: here (part 2)

_______________________________________________________________________________________________________

Presenter: Jingwei Xie

Project title: Using Pretrained Large-Scale Language Models for Universal General-Purpose Sentence Representation

Slides: here

Video: here (part 3)

Final Project Presentations (11/05/2022) - 3pm - 4:00pm Agora

Presenters: Arjun Vaithilingam Sudhakar and Pranshu malviya

Project title: Diversity in Self Supervised representation

Slides: here

Video: here (part 1)

_______________________________________________________________________________________________________

Presenters: Gopeshh Subbaraj and Johan Obando

Project title: Understanding Impact of Scaling in Deep RL

Slides: here

Video: here (part 2)

Final Project Presentations (12/05/2022) - noon to 3pm, Agora

Presenters: Marc-Antoine Provost and Athul Sreemathy Raj

Project title: Federated Learning benchmark for domain generalization

Slides: here

Video: here (part 1)

_________________________________________________________________________

Presenters: Andrei Mircea and Bhagya C

Project title: Curriculum learning of arithmetic, compositionally

Slides: here

Video: here (part 2)

_________________________________________________________________________

Presenters: Ethan Caballero

Project title: Scaling Laws for the Few-Shot Adaptation of Pre-trained Image Classifiers

Slides: here

Video: here (part 3)

Final Project Presentations (13/05/2022) - 10am - 1:30pm Agora

Presenters: Muawiz Chaudhary

Project title: Effect of Pretraining on Medical Imaging

Slides: here

Video: here (part 1)

_________________________________________________________________________

Presenter: Mahta Ramezanian and Nizar Islah

Project title: Hopfield scaling

Slides: here

Video: here (part 2)

_________________________________________________________________________

Presenters: Yusong Wu and Tianyu Zhang

Project title: Contrastive Language–Audio Pre-training

Slides: here

Video: here (part 3)

Final Project Presentations (16/05/2022) - 10am - 1:30pm Agora

Presenters: Arnav Kumar Jain and Hasti Nafisi

Project title: Scaling Laws for Reinforcement Learning

Slides: here

Video:

_________________________________________________________________________

Presenters: Kshitij Gupta

Project title: Scaling laws for transfer between language and RL

Slides: TBA

Video: here (part 2)

_________________________________________________________________________

Presenters: Abhinav Moudgil and Timothy Nest

Project title: Scaling backprop alternatives

Slides: here

Video: here (part 3)

_________________________________________________________________________

Presenters: Reza Bayat

Project title: Scaling Laws for Adversarial Robustness

Slides: here

Video: here (part 4)

_________________________________________________________________________

Presenter: Pravish Sainath

Project title: TBA

Slides: TBA

Video: here (part 5)

Page updated

Google Sites

Report abuse