Physics 731R - Physics of AI

Instructor: Tankut Can

Time/Location: MW 11:30am - 12:45pm, Emerson Chemistry Bldg. E101

Office Hours: MW 2-3pm, MSC N242

Communication: We will use a dedicated Slack workspace for this course. I will use this for all communication related to the course. I also encourage students to use this space to discuss course materials, either via direct message or publicly on an appropriate channel.

If for any reason you are unable to join the Slack workspace, please let me know and we can consider alternative accommodations. Otherwise, please email me and I will send an invitation.

Course Format: The course will have two parts

Lecture-based Deep Learning primer (4-5 weeks): this part will have the structure of a traditional lecture. This will be an accelerated crash course in selected topics in deep learning theory, intended to get everyone up to speed with the necessary background to understand the papers (especially those on theory) that we will cover in the rest of the course.
Paper-based review of Large Language Models: this will be run like a seminar course, in which we cover 1-2 papers per lecture. This will form the bulk of the course. Starting in this part, some of the lectures will be led by students, and some by the me, depending on total enrollment. 1 - 2 students will be assigned to a given lecture, and will present the reading for that day. Participation of the entire class is expected. You can pair up, or go it alone. I encourage visiting my office hours to discuss the paper before you present.

Prerequisites: While the background on deep learning will be provided, students will gain most from this course if they already have a strong command of the mathematical techniques in theoretical physics, typically taught in core graduate classes such as statistical physics and mathematical methods, and including areas such as: statistics, probability theory, linear algebra, real analysis. Advanced mathematical topics such as dynamical systems and random matrix theory, will be covered as needed

Grading:

20% Presentation: the presentations will be based on a paper, given in the style of a journal club. 1-2 students will be assigned to a reading, and will be responsible for presenting the paper and leading the resulting discussion. The goal of a paper presentation is to give an accessible overview of the reading, discussing what it says (main results and methods), why it matters, and its limitations. Every successful presentation should also give relevant background and context, so that we can understand not only the main findings, but why it matters and if it's even interesting.

Tips: many papers we cover will have gone through the peer review process on OpenReview, in which case you will be able to see the full history of referee reports and author responses. This can be a useful resource for understanding a paper's context. Note also, given the unbelievable hype in this field, for many of these papers there are also youtube tutorials, talks, and blog posts. I will do my best to point these out when I can.

80% Participation: before class, have 1-2 questions/comments prepared to fuel discussion of the readings for that day. I will start collecting these during second part of the course. You can get these to me in any way, but the preferred route is direct message on Slack.

An excellent alternative to submitting questions is to reproduce the results of the paper, since many of them will have an accompanying github repo which has all the necessary code.

Important: if you use an LLM to generate these questions/comments, please include the details of how you prompted the chatbots. I would be interested to see how far you can go with LLMs in this task- can they identify real flaws in a paper, or suggest meaningful and interesting new directions?

This course will have no programming component. A great thing about deep learning theory is that (usually) you can run experiments yourself and put your theory against the blade of falsification. If you'd like to explore this yourself, I can suggest the following resources:

Andrey Karpathy's tutorial Let's build GPT-2
Curated jupyter notebooks accompanying Mehta et al. review

Course Schedule:

Below you will find a tentative schedule of the topics covered in this course. I say tentative because I anticipate making changes in response to student interest or need. For instance, if we need more time with signal propagation, we might add a lecture or two for this purpose. Or if a groundbreaking paper drops on arxiv in the next few months (not inconceivable, given the pace of this field), we might want to dedicate a lecture or two studying it.

I have also intentionally left open the lecture topics for the last few weeks. We can decide these dynamically, based on general interest. They might also be used for student presentations on additional topics of their choice (subject to my approval).