Continual Learning, Scaling & Foundation Models
Time: M & W, 4:30-6:30pm EST Location: Mila Auditorium 2 & online
Time: M & W, 4:30-6:30pm EST Location: Mila Auditorium 2 & online
IMPORTANT: all discussions are on AGI discord, channel #h2025-ift6167-general
Instructor: Irina Rish (irina.rish at mila.quebec), irina-rish.com
Related links: CERC-AAI Lab, Scaling Workshops
This course: Continual Learning, Scaling and Foundation Models (IFT6167)
Location: Auditorium 2 at Mila, 6650, boul. St-Urbain, Montréal
This seminar-style course will focus on recent advances in the rapidly developing area of "foundation models", i.e. large-scale neural network models (e.g., GPT-3, CLIP, DALL-e, etc) pre-trained on very large, diverse datasets. Such models often demonstrate significant improvement in their few-shot generalization abilities, as compared to their smaller-scale counterparts, across a wide range of downstream tasks - what one could call a "transformation of quantity into quality" or an "emergent behavior". This is an important step towards achieving the long-standing objective of artificial general intelligence (AGI). By AGI here we mean a "general", i.e. broad, versatile AI capable of quickly adapting to a wide range of situations and tasks, both novel and those encountered before - i.e. achieving good stability (memory) vs plasticity (adaptation) trade-off, using the continual learning terminology. In this course, we will survey the most recent advances in large-scale pre-trained models, focusing specifically on empirical scaling laws of such systems' performance, with increasing compute, model size, and pretraining data (power laws, phase transitions), and in particular, on continual learning of foundation models - a rapidly growing field that aims to make foundation models maintainable/adaptable as the amount of new data increases, without the brute-force retraining of such models from scratch. We will review both "classical" continual learning and modern advances in it in the context of large-scale foundation models that affect the "catastrophic forgetting" vs adaptation trade-off and introduce novel challenges.
In this course, besides several introductory and invited lectures by the instructor and guest speakers, respectively, we will survey and present recent papers listed in the "Topics & Papers" section from the menu on top of this page. If you have any suggestions about the papers to review, please contact the instructor and/or the TAs.
Paper presentations (2 papers per student, teams of 2 students per paper): 40%
Class project (report + poster presentation): 50%
Please submit your project reports by entering the title, author names and a shared link (share with irina.rish@mila.quenec) in our planning document - there is FinalReports page there.
Class participation: asking questions, participating in discussions (on discord and in class): 10%
Note: due to time zone differences, it may be difficult for all students to join all classes in person; the classes will be recorded, and questions regarding the papers to be discussed can be submitted on the course discord.