Towards AGI:
Continual Learning, Scaling & Foundation Models
Time: M & W, 3:30-5:30pm EST Location: Mila Auditorium 1 & online
Time: M & W, 3:30-5:30pm EST Location: Mila Auditorium 1 & online
IMPORTANT: all discussions are on AGI discord, channel #fall-2025-ift6167
Instructor: Irina Rish (irina.rish at mila.quebec), irina-rish.com
Related links: CERC-AAI Lab, Scaling Workshops
This course: Continual Learning, Scaling and Foundation Models (IFT6167)
Location: Auditorium 1 at Mila, 6650, boul. St-Urbain, Montréal
This seminar-style course explores the rapidly advancing field of foundation models—large-scale neural network systems pre-trained on massive and diverse datasets. Such models have demonstrated striking emergent behaviors and few-shot generalization, where scale itself transforms quantitative increases in compute, data, and parameters into qualitatively new capabilities. These developments mark an important step toward artificial general intelligence (AGI)—a broad, versatile form of AI that can adapt rapidly to novel tasks while retaining knowledge from prior experience.
A central focus will be on scaling laws—empirical regularities in performance with increasing model size, data, and compute—and the role of continual learning in addressing the stability–plasticity trade-off (catastrophic forgetting vs. fast adaptation). We will discuss both classical approaches and modern methods adapted to large-scale foundation models, where brute-force retraining is impractical.
To ground these advances, the course will also introduce high-performance computing (HPC) and distributed machine learning, covering practical foundations such as types of parallelism (data, tensor/model, pipeline, and sequence parallelism) and efficiency techniques (checkpointing, mixed precision, MoE architectures). Students will learn how scaling is enabled in practice on large clusters, and how engineering choices interact with scientific advances in model design.
Beyond scaling, we will highlight current frontier topics in foundation models, including:
Reasoning and inference-time scaling (chain-of-thought, test-time adaptation, tool use)
Multimodality (integration of text, images, video, time-series, tabular, and beyond)
Mechanistic interpretability (understanding internal circuits, features, and behaviors)
Synthetic data, simulation, and automated evaluation
Automating AI research & development: AI systems that help design, test, and optimize other AI systems (e.g., automated hyperparameter search, architecture search, and self-improving research loops).
The course format will combine introductory lectures and invited talks with seminar-style student presentations, based on recent papers listed in the Topics & Papers section. Students are encouraged to suggest additional readings and contribute to shaping the evolving syllabus.
Paper presentations (2 papers per student, teams of 2 students per paper): 40%
Class project (report + poster presentation): 50%
Please submit your project reports by entering the title, author names and a shared link (share with irina.rish@mila.quenec) in our planning document - there is FinalReports page there.
Class participation: asking questions, participating in discussions (on discord and in class): 10%
Note: due to time zone differences, it may be difficult for all students to join all classes in person; the classes will be recorded, and questions regarding the papers to be discussed can be submitted on the course discord.