This seminar course focuses on the computational foundations of online learning, online optimization, and reinforcement learning. We will explore seminal and recent papers covering topics such as stochastic bandits, adversarial bandits, and regret minimization. The course provides a rigorous and mathematical perspective on these problems. We will see how principles in reinforcement learning, e.g., exploration and exploitation, can be used rigorously to obtain provable guarantees in online learning.
The course is open to students with a strong interest in algorithm design, analysis, and machine learning. The seminar is centered on reading and discussion of research papers from top conferences related to online learning. Evaluation will be based on paper reviews, in-class presentations, and active participation. There are no exams or tests.
Course Number: CSCI 4965/6965
Credit Hours: 4.0
Semester / Year: Spring 2026
Lectures: Monday/Thursday 12:00 pm -- 1:50 pm
Room Location: Carnegie Building 201
Prerequisites: CSCI-4020: Design and Analysis of Algorithms or permission of the instructor
(In adherence to relevant data sharing protocols, these resources are restricted to RPI internal access)
Seminar Presentation Schedule: Please use this document for our meeting schedule and signing up for presentations
Slides for the introduction lecture
For both 4965 and 6965 students:
- Learning the basic concepts in online learning and multi-armed bandits
- Learning the core techniques for online learning algorithms
- Understanding the principles behind the design of online learning algorithms
For 6965 students:
- Understanding some of the state-of-the-art research in online learning
Chen Wang
Email: wangc33@rpi.edu (please only use this email address for course-related communications)
Office hours: Fridays 3:00 - 5:00 pm
Paper reviews (40%):
We have 1-2 papers to discuss every week
Students are required to write reviews for the paper
Reviews should contain: paper summary, main techniques, merits and criticisms, and personal verdict of the paper
For 4965 students: The grading criteria for undergraduate students are easier than those of graduate students
In-class presentation (40%)
Students will lead discussions for some specific papers
Each student is required to lead the discussion for 1 week in the semester
Grading is based on the depth and clarity of the discussion
For 4965 students: Depending on the final enrollment, it is possible for undergraduate students to present in teams
Attendance (20%)
For a seminar course, attending the presentation (especially the presentation of your classmates) is necessary
Optional Problem Sets (extra 10%):
We will have 1-2 problem sets that carry an extra 10% credit
Basic concepts in online learning: pure exploration and regret minimization; the median elimination algorithm
Exploration with instance-sensitive sample complexity
Exploration with a limited round of adaptivity; lower bound with information theory
Exploration in streaming bandits I: single-pass algorithm and lower bound; multi-pass algorithm
Exploration in streaming bandits II: multi-pass lower bounds for exploration
Regret minimization in bandits: the UCB algorithm; lower bound
Regret minimization in streaming bandits: single-pass algorithm and lower bound; multi-pass algorithm
The expert problem and adversarial bandits: MWU and EXP3
The expert problem with limited memory: lower bounds and algorithms in special settings
Near-optimal algorithms for the expert problem with limited memory
Algorithms for adversarial bandits with limited memory
Quantum Bandits: weak and strong quantum oracles; optimal bounds
Online learning with differential privacy
Applications of online learning algorithms
We do not use any textbook for this course.
The following course materials are related and helpful:
Theory of Multi-armed Bandits and Reinforcement Learning by Jiantao Jiao
Online learning and multi-armed bandits: Lecture Notes by Varun Kanade
Bandit Algorithms by Tor Lattimore and Csaba Szepesvari
Randomized Algorithms by Samson Zhou
The Rensselaer Handbook of Student Rights and Responsibilities and The Graduate Student Supplement define various forms of Academic Dishonesty, and you should make yourself familiar with these. In this class, all materials that are turned in for a grade must represent the student’s own work. Discussion and resorting to external resources are allowed; however, a notation should indicate your collaboration.
Submission of any course material that is in violation of this policy may result in a penalty of a direct F grade in the course and/or referral to the appropriate Dean (Dean of Students for undergraduate students or the Dean of Graduate Education for graduate students, respectively).
Large language models (LLMs) policy: You cannot use LLMs for your paper review. Please note that it is often very easy to distinguish human vs. AI-based reviews.
You are allowed to use LLMs to prepare your presentation. However, using LLMs to put everything in the slides is not the way to construct the best presentation. The format of board presentations is highly recommended.
You are allowed to use LLMs to solve the homework problems, provided that you understand the solutions. Note that copying from LLMs without understanding them usually leaves a very clear mark. If you used LLMs for the optional problem set, please also indicate such usage on your submission, and it will not affect your grades.
If you are suspected of copying from LLMs and/or other students without understanding them, you will receive a notification from me, and it is your responsibility to justify why it is not the case.
Rensselaer Polytechnic Institute strives to make all learning experiences as accessible as possible. If you anticipate or experience academic barriers based on a disability, please let me know immediately so that we can discuss your options. To establish reasonable accommodations, please register with The Office of Disability Services for Students. After registration, make arrangements with the Director of Disability Services as soon as possible to discuss your accommodations so that they may be implemented in a timely fashion. DSS contact information: dss@rpi.edu; +1-518-276-8197; 4226 Academy Hall.