Postdoctoral researcher at Meta AI Research. 

My research focuses on problems that are important to a general, reinforcement-learning-based intelligent system, but are not well understood yet. 

I obtained my Ph.D. in Computing Science at the University of Alberta, supervised by Professor Richard S. Sutton. I earned my Bachelor's degree in Electrical and Computer Engineering (ECE) from Shanghai Jiao Tong University (SJTU), where I worked in Professor Kai Yu's speech group.  After that, I earned my Master's degree, also in ECE, from the University of Michigan. I worked on the application of reinforcement learning in robotics when I was in Michigan, supervised by Professor Ben Kuipers.



Google Scholar


Loosely Consistent Emphatic Temporal-Difference Learning.

Jiamin He, Fengdi Che, Yi Wan, Rupam Mahmood (2023), 



The Emphatic Approach to Average-Reward Policy Evaluation.

Jiamin He, Yi Wan, Rupam Mahmood (2022), 

In NeurIPS 2022 Workshop on DeepRL.


On Convergence of Average-Reward Off-Policy Control Algorithms in Weakly Communicating MDPs.

Yi Wan, Richard S. Sutton (2022), 

A short version accepted by the NeurIPS Workshop on Optimization for Machine Learning.

Paper, Code.

Toward Discovering Options that Achieve Faster Planning. 

Yi Wan, Richard S. Sutton (2022), 

Abstract accepted by the Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM).

Paper, Code.

Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods. 

Yi Wan,* Ali Rahimi-Kalahroudi,* Janarthanan Rajendran, Ida Momennejad, Sarath Chandar, and Harm van Seijen (2022), 


Paper, Code, Talk, Slides, Poster.

Average-Reward Learning and Planning with Options. 

Yi Wan, Abhishek Naik, and Richard S. Sutton (2021), 


Paper, Talk, Slides, Poster.

Average-Reward Off-Policy Policy Evaluation with Function Approximation. 

Shangtong Zhang,* Yi Wan,* Richard S. Sutton, and Shimon Whiteson (2021),


Paper, Code, Talk, Slides.

Learning and Planning in Average-Reward Markov Decision Processes. 

Yi Wan,*  Abhishek Naik,* and Richard S. Sutton (2021), 


Paper, Code, Talk, Slides, Poster.

Planning with Expectation Models for Control. 

Katya Kudashkina, Yi Wan, Abhishek Naik, and Richard S. Sutton (2021),



Off-policy Maximum Entropy Reinforcement Learning: Soft Actor-Critic with Advantage Weighted Mixture Policy (SAC-AWMP). 

Zhimin Hou,* Kuangen Zhang,* Yi Wan, Dongyu Li, Chenglong Fu, Haoyong Yu (2020), 



Planning with Expectation Models. 

Yi Wan,* Muhammad Zaheer,* Adam White, Martha White, and Richard S. Sutton (2019), 


Paper, Talk, Slides.

Model-based Reinforcement Learning with Non-linear Expectation Models and Stochastic Environments. 

Yi Wan,* Zaheer Abbas,* Martha White, and Richard S. Sutton (2018), 

The Joint IJCAI/ECAI/AAMAS/ICML Conference Workshop on Prediction and Generative Modeling in Reinforcement Learning.

Paper, Slides.


Yi Wan and Daniel Plop (2019), A Python Toolkit for Managing a Large Number of Experiments


Journal Reviewer: TMLR (2022, 2023)

Conference Reviewer: NeurIPS (2021, 2022, 2023), ICML (2022), ICLR (2020, 2021, 2022), CoLLAs (2022, 2023), AAAI (2023)

Workshop Reviewer: Decision Aware RL workshop in ICML (2022), RL4RealLife workshop in ICML (2021), optimization for machine learning workshop in NeurIPS (2022).

Organizer: Continuing (Non-Episodic) RL problems social at ICML (2021), Designing an RL system toward AGI social at ICML (2022). 

Volunteer: ICML (2022) session moderator


Reinforcement Learning II (COMPT 609) 2020, 2021, 2022 Teaching Assistant. Guest lecture: A Second Tutorial on Tabular TD(λ), Slides

Reinforcement Learning I  (COMPT  366) 2018 Teaching Assistant


Winter is long, skiing is fun.

Marmot Basin, Jasper, Canada

Photo from a video filmed by Shangtong Zhang 

Blackcomb Glacier Ice Cave, Whistler, Canada