Abstracts

Jason D. Lee (Princeton University)

Title:
Foundations of Representation Learning: Gradient Descent, Supervised to Self-Supervised

Abstract:
This talk explores the foundations of representation learning, focusing on the efficacy of gradient descent and the transition from supervised to self-supervised learning. We delve into provable representation learning in deep learning, studying the mechanisms of supervised pre-training and self-supervised learning. These methods utilize a large labeled source dataset and auxiliary pretext tasks respectively, proving their effectiveness in learning representations that enhance downstream tasks. We then investigate the task of learning a single index model, bridging the gap between upper and lower bounds in learning complexities. Subsequently, we confront the limitations of kernel methods, demonstrating how gradient descent on a two-layer neural network excels in learning representations relevant to the target task, and enabling efficient transfer learning. Our work culminates in improved sample complexity and a heuristic for transfer learning with target sample complexity independent of d.


Prerequisites:


Ernest Ryu will hold a pre-study seminar from 9:30 am–10:45 am on Aug. 5th and 6th at 상산수리과학관 1층 강당. This seminar will quickly introduce the following topics of learning theory: uniform convergence, Rademacher, covering numbers, basic concentration, and kernel methods.






Boaz Barak (Harvard University)

Title:

Deep Learning: Foundations, Puzzles, Techniques, and Challenges


Abstract:

In this mini-course I will cover some of what we know on the foundations of deep learning, and its difference from classical learning theory. Focus will be on recent research results and insights from experiments.


LECTURE 1: Overview of classical learning theory. Some contrasts between modern deep learning and classical learning theory. Depending on time: discussion of transformer architecture.


LECTURE 2: Training dynamics part 1: Optimizations and tradeoffs, impact of choices on computational efficiency, optimization efficiency, generalization efficiency. Empirical phenomena e.g. Simplicity Bias, Deep bootstrap, edge of stability, scaling laws


LECTURE 3: Training dynamics part 2: (continued from part 1)


LECTURE 4: Test-time computation. test-time augmentation, chain of thought, beam search, retrieval-based models, differentiable vs non-differentiable memory, natural language as a universal API. 



Prerequisites:

These lectures are aimed at students and researchers of all levels. We will assume background at the level of an undergraduate machine learning course, as well as comfort with probability and optimization. Experience with deep learning will be beneficial but not necessary.





Ernest K. Ryu (Seoul National University)

Title: Toward a Grand Unified Theory of Accelerations in Optimization and Machine Learning

Abstract:

Momentum-based acceleration of first-order optimization methods, first introduced by Nesterov, has been foundational to the theory and practice of large-scale optimization and machine learning. However, finding a fundamental understanding of such acceleration remains a long-standing open problem. In the past few years, several new acceleration mechanisms, distinct from Nesterov's, have been discovered, and the similarities and dissimilarities among these new acceleration phenomena hint at a promising avenue of attack for the open problem. In this talk, we discuss the envisioned goal of developing a mathematical theory unifying the collection of acceleration mechanisms and the challenges that are to be overcome.

Prerequisites:

This talk assumes basic familiarity with the use of SGD in deep learning.