CSE 291 Project

Introduction

In this course, you will learn advanced topics in deep reinforcement learning (RL). Part of the learning will be through in-class lectures and take-home assignments, but you will really gain hands-on experience with deep RL in your course project. We would like you to choose wisely a project that fits your interests. One that would be both motivating and technically challenging.

Project Format

You will participate in the project in small groups, normally 2 - 3 people. The project contributes to 50% of the course credits:

  • 5 % proposal

  • 15 % milestone report

  • 20 % final report

  • 10 % final presentation

In the end, you will write an 8 pages paper about your project.

Project Scope

Most students choose one of four types of projects:

  • Theoretical project. Analyze and prove some interesting properties of a deep RL model or a learning algorithm.

  • Algorithmic project. Develop a new learning algorithm, or a novel variant of an existing algorithm, for deep RL.

  • Modeling project. Develop a new deep neural network architecture, or a novel variant of an existing model for RL.

  • Application project. Apply a deep RL model and a learning algorithm to solve a novel application of interests more efficiently.

Many fantastic projects come from students picking either an application that they’re interested in, or picking some subfield of deep RL that they want to explore more. So, pick something that you are passionate about! Be brave rather than timid, and do feel free to propose ambitious things that you’re excited about. ( Just be sure to ask us for help if you’re uncertain how to best get started.) Alternatively, if you’re already working on a research project that deep RL might apply to, then you may already have a great project idea.

Evaluation Criteria

Your project will be evaluated based on criteria that are similar to a research paper.

  • Novelty. Is this project applying a common technique to a well-studied problem, or is the problem or method relatively unexplored?

  • Significance. Did the authors choose an interesting or a “real” problem to work on, or only a small “toy” problem? Is this work likely to be useful and/or have an impact?

  • Technical Quality. Does the technical material make sense? Are the things tried reasonable? Are the proposed algorithms or applications clever and interesting? Do the authors convey novel insight about the problem and/or algorithms?

  • Presentation and Writing. Is the idea and the solution clearly conveyed? Are the figures and tables carefully crafted? Is the report well structured and well reasoned?

Hints: A very good project will be a publishable or nearly-publishable piece of work. Three of the main machine learning conferences are ICML, NeurIPS and ICLR. All papers from these machine learning conferences are available online.

You can browse some of the recent machine learning papers to get inspired.

Resources

Framework:

Open AI Gym is a toolkit for developing and comparing RL algorithms. You can learn more about Gym with Pytorch tutorial on Deep RL. You can also learn more about RL with Ray RLlib, an open-source library for reinforcement learning.

Compute:

Example Project

Safe Imitation Learning

In imitation learning, an autonomous agent learns how to perform a task by observing demonstrations from an expert. However, in safety-critical applications such as health care and autonomous driving, it is imperative for the agent to measure the associated risk from the expert's demonstrations and to adjust accordingly. This project aims to develop safe imitation learning algorithms to learn a behavior policy by imitating an expert (e.g. experienced doctors or human drivers), but also weighing the risks of their demonstrations. The performance will be evaluated with the quality of the learned policy as well as the safety requirements.

References:

Learning Reward Function

A reward function is central to reinforcement learning. In RL, all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a scalar reward. However, identifying the reward function itself is not obvious. In complex decision-making scenarios, it remains an open question how to design the reward function (e.g. individual incentive v.s. group reward). This project aims to explore the formalism and the framework for learning the reward function. The project will be evaluated by the novelty of the formulation and theoretical justification.

References:

Sample-Efficient Reinforcement Learning

Reinforcement learning algorithms, such as those used in Go and Chess, require millions of episodes to learn a policy. This significantly hinders its applicability to the real world. Model-based RL provides one strategy to improve the sample efficiency of RL by learning an environment model. However, learning the dynamics of a complex environment is difficult. The learned environment model may not be completely faithful. This project aims to combine model-based RL with model-free RL and improve sample efficiency. The project will be evaluated by the number of episodes needed to reach an optimal policy.

References: