I am a fifth-year Ph.D. student in Computing Science at the University of Alberta, focusing on reinforcement learning, which I believe is the most promising way to artificial general intelligence. I am honored to be supervised by Professor Rich Sutton.

My long-term research goal is to build simple, general, and scalable learning and planning algorithms for the reinforcement learning problem. I am particularly interested in designing these algorithms 1) that maximize the long-term average-reward objective, 2) with function approximation, and 3) with temporal abstractions.

Previously, I earned my Bachelor's degree in Electrical and Computer Engineering (ECE) from Shanghai Jiao Tong University (SJTU), where I worked in SJTU Speech Lab, supervised by Professor Kai Yu. After that, I earned my Master's degree, also in ECE, from the University of Michigan. I had a great experience working in Intelligent Robotics Lab when I was in Michigan, supervised by Professor Ben Kuipers.

Email: wan6@ualberta.ca


University of Alberta

Ph.D. candidate

Computing Science

2017-2022 (expected)

University of Michigan

Master of Science in Engineering

Electrical and Computer Engineering


Shanghai Jiao Tong University

Bachelor of Science in Engineering

Electrical and Computer Engineering


Work Experience

J.P. Morgan,

London, UK

AI Research Intern


Quebec Artificial Intelligence Institute (Mila), Montreal, Canada

Research Intern


Huawei Technologies,

Edmonton, Canada

Research Intern


Yitu Technology

Shanghai, China

Software Engineer Intern



San Diego, US

Software Research Engineer Intern



Toward Discovering Options that Achieve Faster Planning.

Yi Wan, Richard S. Sutton (2022),

Under Review of NeurIPS. Abstract Accepted by RLDM.

Paper, Code, Poster.

Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods.

Yi Wan*, Ali Rahimi-Kalahroudi*, Janarthanan Rajendran, Ida Momennejad, Sarath Chandar, and Harm van Seijen (2022),


Paper, Code, Talk, Slides, Poster.

Average-Reward Learning and Planning with Options.

Yi Wan, Abhishek Naik, and Richard S. Sutton (2021),


Paper, Talk, Slides, Poster.

Average-Reward Off-Policy Policy Evaluation with Function Approximation.

Shangtong Zhang*, Yi Wan*, Richard S. Sutton, and Shimon Whiteson (2021),


Paper, Code, Talk, Slides.

Learning and Planning in Average-Reward Markov Decision Processes.

Yi Wan*, Abhishek Naik*, and Richard S. Sutton (2021),


Paper, Code, Talk, Slides, Poster.

Planning with Expectation Models for Control.

Katya Kudashkina, Yi Wan, Abhishek Naik, and Richard S. Sutton (2021),



Off-policy Maximum Entropy Reinforcement Learning: Soft Actor-Critic with Advantage Weighted Mixture Policy (SAC-AWMP).

Zhimin Hou*, Kuangen Zhang*, Yi Wan, Dongyu Li, Chenglong Fu, Haoyong Yu (2020),



Planning with Expectation Models.

Yi Wan*, Muhammad Zaheer*, Adam White, Martha White, and Richard S. Sutton (2019),


Paper, Talk, Slides.

Model-based Reinforcement Learning with Non-linear Expectation Models and Stochastic Environments.

Yi Wan*, Zaheer Abbas*, Martha White, and Richard S. Sutton (2018),

The Joint IJCAI/ECAI/AAMAS/ICML FAIM Workshop on Prediction and Generative Modeling in Reinforcement Learning.

Paper, Slides.


A Python Toolkit for Managing a Large Number of Experiments


Journal Reviewer: TMLR

Conference Reviewer: NeurIPS, ICML, ICLR, CoLLAs

Workshop Reviewer: Decision Aware RL workshop at ICML2022, RL4RealLife workshop at ICML2021

Organizer: Continuing (Non-Episodic) RL problems social at ICML2021.

Volunteer: ICML 2022 session moderator


Reinforcement Learning II (COMPT 609) 2020, 2021, 2022 Teaching Assistant. Guest lecture: A Second Tutorial on Tabular TD(λ), Slides

Reinforcement Learning I (COMPT 366) 2018 Teaching Assistant


Winter is long, skiing is fun.

Marmot Basin, Jasper, Canada

Photo from a video filmed by Shangtong Zhang

Blackcomb Glacier Ice Cave, Whistler, Canada