This seminar course focuses on the computational foundations of online learning, online optimization, and reinforcement learning. We will explore seminal and recent papers covering topics such as stochastic bandits, adversarial bandits, and regret minimization. The course provides a rigorous and mathematical perspective on these problems. We will see how principles in reinforcement learning, e.g., exploration and exploitation, can be used rigorously to obtain provable guarantees in online learning.
The course is open to students with a strong interest in algorithm design, analysis, and machine learning. The seminar is centered on reading and discussion of research papers from top conferences related to online learning. Evaluation will be based on paper reviews, in-class presentations, and active participation. There are no exams or tests.
Course Number: CSCI 4965/6965
Credit Hours: 4.0
Semester / Year: Spring 2026
Lectures: Monday/Thursday 12:00 pm -- 1:50 pm
Room Location: TBD
Prerequisites: CSCI-4020: Design and Analysis of Algorithms or permission of the instructor
TBD
Chen Wang
Email: wangc33@rpi.edu (please only use this email address for course-related communications)
Office hours: TBD
Paper reviews (40%):
We have 1-2 papers to discuss every week
Students are required to write reviews for the paper
Reviews should contain: paper summary, main techniques, merits and criticisms, and personal verdict of the paper
For 4965 students: The grading criteria for undergraduate students are easier than those of graduate students
In-class presentation (40%)
Students will lead discussions for some specific papers
Each student is required to lead the discussion for 1 week in the semester
Grading is based on the depth and clarity of the discussion
For 4965 students: Depending on the final enrollment, it is possible for undergraduate students to present in teams
Attendance (20%)
For a seminar course, attending the presentation (especially the presentation of your classmates) is necessary
Basic concepts in online learning: pure exploration and regret minimization; the median elimination algorithm
Exploration with instance-sensitive sample complexity
Exploration with a limited round of adaptivity; lower bound with information theory
Exploration in streaming bandits I: single-pass algorithm and lower bound; multi-pass algorithm
Exploration in streaming bandits II: multi-pass lower bounds for exploration
Regret minimization in bandits: the UCB algorithm; lower bound
Regret minimization in streaming bandits: single-pass algorithm and lower bound; multi-pass algorithm
The expert problem and adversarial bandits: MWU and EXP3
The expert problem with limited memory: lower bounds and algorithms in special settings
Near-optimal algorithms for the expert problem with limited memory
Algorithms for adversarial bandits with limited memory
Quantum Bandits: weak and strong quantum oracles; optimal bounds
Online learning with differential privacy
Applications of online learning algorithms
We do not use any textbook for this course.
The following course materials are related and helpful:
Theory of Multi-armed Bandits and Reinforcement Learning by Jiantao Jiao
Online learning and multi-armed bandits: Lecture Notes by Varun Kanade
Bandit Algorithms by Tor Lattimore and Csaba Szepesvari
Randomized Algorithms by Samson Zhou
The Rensselaer Handbook of Student Rights and Responsibilities and The Graduate Student Supplement define various forms of Academic Dishonesty, and you should make yourself familiar with these. In this class, all materials that are turned in for a grade must represent the student’s own work. Discussion and resorting to external resources are allowed; however, a notation should indicate your collaboration.
Submission of any paper review that is in violation of this policy may result in a penalty of a direct F grade in the course and/or referral to the appropriate Dean (Dean of Students for undergraduate students or the Dean of Graduate Education for graduate students, respectively).
Large language models (LLMs) policy: You cannot use LLMs for your paper review. Please note that it is often very easy to distinguish human vs. AI-based reviews. If you are suspected of copying from LLMs and/or other students without understanding them, you will receive a notification from me, and it is your responsibility to justify why it is not the case.
You are allowed to use LLMs to prepare your presentation. However, using LLMs to put everything in the slides is not the way to construct the best presentation.
Rensselaer Polytechnic Institute strives to make all learning experiences as accessible as possible. If you anticipate or experience academic barriers based on a disability, please let me know immediately so that we can discuss your options. To establish reasonable accommodations, please register with The Office of Disability Services for Students. After registration, make arrangements with the Director of Disability Services as soon as possible to discuss your accommodations so that they may be implemented in a timely fashion. DSS contact information: dss@rpi.edu; +1-518-276-8197; 4226 Academy Hall.