Class Project

You will also do a course project on robot learning. Details of how to get started are described below.

Deadlines for the course project

By Feb 13 (midnight EST) submit your project proposal
By March 17 (midnight EST) submit your mid-term report
On April 27 you will present your projects in class
By May 2 (midnight EST) submit your final report

All project reports will be submitted via Gradescope

Submissions for class project will be penalized by a deduction in points by 1/3 per day (except for the final report, which has no late days)

The final report has no late days - if you submit it late you will get 0 credit for that component; it needs to be submitted on time so that we can submit the final grades on time

Only 1 submission is needed per team the project submissions.

Make a Team ASAP (if you want one):

You are recommended to start forming a team as soon as possible. The project must be in a group of size 1-2 (you can work on the project by yourself if you choose).

If you do not know anyone else in the class, turn to your neighbors before class and introduce yourself! You can also post on Piazza if you are looking for teammates.

Once you have a team, please register it on Canvas under the the People->Groups Section.

Project Requirements

Structure of the project

The goal of the project is to help you figure out what matters for RL. Thus the project has the following structure:

For the initial proposal, you should have a random agent (e.g. an agent that takes random actions) running in your environment, and report the results in your initial report
For the midterm, you should have an off-the-shelf baseline RL method running in your environment. You can use any off-the-shelf method you like.
For the final report, you must show how 2 different modifications will change the performance of your baseline agent. You can modify anything you pick; here are some suggestions

Change the input domain or add another sensor (e.g. state, RGB images, depth images, point clouds, tactile sensing, force feedback, etc). If your environment does not allow you to easily add another sensor, then manually process the input in some way to create another input to the system (e.g. modify an image by extracting edges to create an "edge image", or you can try removing one of the inputs, etc)
Change the network architecture structurally (not just changing the number of layers or size of each layer but some larger structural change).
Change the action space (e.g. end-effector position control, end-effector force control, joint torques, joint angles, high-level action primitives, different gripper type, etc). If your environment does not allow you to easily do this, then define a function that manually converts the network output to the original action space and pass the network output through that function before producing the output; simple example: action = A * e^{B} - C, where A, B, and C are network outputs.
Change the reward function (e.g. add additional rewards to guide the agent to achieve the task, or sparsify the reward to make it less dense). If your environment does not allow you to easily modify the reward based on the environment structure, try modifying the reward in some other way, such as setting the reward randomly to 0 with some probability.
Modify the learning algorithm in some way
Or anything you want!

The modifications can be fairly simple or arbitrarily complex; they can be potential research ideas that could lead to a publication, or they can be fairly minor changes.

Feel free to discuss your project ideas with a TA or instructor.

Pick an environment

Pick an RL environment for your project. You can use any environment that you choose! You can reuse an existing environment or create a new one. Here are some example environments that you can use to train your RL agent:

Standard gym environments: Atari, MuJoCo, Toy Text, Classic Control, Box2D (can be installed with OpenAI Gym)
Third party environments: A long list of environments compatible with OpenAI Gym
Gym Retro - a platform for reinforcement learning research on games
Safety Gym
SoftGym

You should not choose an environment that is "too simple" such as cartpole or inverted pendulum.

Your life will be much easier if the environment follows the Gym interface (you will see that most RL environments that you find online use this interface)- see details here: https://www.gymlibrary.ml/

Training agents

You are allowed (and encouraged) to use existing RL libraries for your project. The idea of the project is to help you understand what types of changes have different effects on training RL agents. Here are a few libraries that you might find helpful:

- https://stable-baselines3.readthedocs.io/en/master/
- https://skrl.readthedocs.io/en/latest/
- https://github.com/kchua/handful-of-trials
- https://github.com/MishaLaskin/curl

Grade

Your grade for the class project will be subdivided as follows:

Project proposal: 10%
Mid-term Report: 20 %
Final Report: 50%
Final Presentation: 20%

Instructions

The instructions for each of these components are as follows:

Project Proposal

Download the latest CoRL LaTeX template and use it for your project writeups. Add your title, names of authors, affiliation and abstract according to CoRL guidelines. Make sure that the authors are not listed as anonymous so we can grade your submission!

Your proposal will be evaluated as follows (10 total points):

Environment (1 points): Describe the environment that you will use for your experiments, in detail. Please include images of your environment. You should not choose an environment that is "too simple" such as cartpole or inverted pendulum. You can change this later on in the course if you like.
Reward function (1 points): Describe your reward function. You can change this later on in the course if you like.
Method: Describe the 2 modifications that you are going to try. You can change these later on in the course if you like. You will get 3 points for explaining each of the 2 proposed modifications. (6 points)
Results:
- Includes a plot of the performance of a random agent (e.g. an agent that takes random actions) (1 points)
- Link to a website showing videos of the performance of the random agent (e.g. an agent that takes random actions) (1 points)

Midterm Report

As before, please use the latest CoRL template. You should have implemented a baseline and completed 1 of the 2 modifications. You are welcome to change anything from the initial project proposal that you would like.

Midterm reports will be evaluated as follows (20 total points) - new sections are bolded in red. You are welcome to change any of the sections from the project proposal if you wish.

Environment (1 points): Describe the environment that you will use for your experiments, in detail; You should not choose an environment that is "too simple" such as cartpole or inverted pendulum. You can change this later on in the course if you like.
Reward function (1 points): Describe your reward function. You can change this later on in the course if you like.
Method: Describe the 2 modifications that you are going to try. You can change these later on in the course if you like. You will get 1 point for explaining each of the 2 proposed modifications. (2 points)
Results (Note that the performance of the different methods should be on the same figure):
- Includes a plot of the performance of a random agent (e.g. an agent that takes random actions) (1 points)
- Includes a plot of the performance of a baseline agent (the baseline is a learned, non-random agent using an off-the-shelf RL method). The baseline agent should be working significantly better than the random agent (the random agent is an agent that takes random actions). The agent does not need to perfectly solve the task or even perform particularly well; the main point is just to show that your policy is learning something, by demonstrating that the performance is better than random. (10 points)
- Link to a website showing videos of the performance of the different agents (updated to include the baseline agent) (5 points)

Final Report

The final report should be at most 8 pages long, not including references, though it can be shorter if you wish. As before, please follow the CoRL template. You can also include an appendix with any additional information, which can be as long as you want. You can change anything you would like compared to the proposal or the midterm report.

Final reports will be evaluated as follows (total 50 points) - new sections are bolded in red. You are welcome to change any of the sections from the midterm report if you wish.

Report is no more than 8 pages long and follows the CoRL template: (1 point)
Environment (1 points): Describe the environment that you will use for your experiments, in detail; You should not choose an environment that is "too simple" such as cartpole or inverted pendulum.
Reward function (1 points): Describe your reward function.
Method: Describe the 2 modifications that you have tried. You will get 3 points for explaining each of the 2 proposed modifications. (6 points total)
Results (Note that the performance of the different methods should be on the same figure):
- Includes a plot of the performance of a random agent (e.g. an agent that takes random actions) (1 points)
- Includes a plot of the performance of a baseline agent (2 points)
- Includes a plot of the performance of all 2 modifications (9 points per modification; 18 points total)
- Link to a website showing videos of the performance of the different agents (updated to include new agents) (3 point per agent, 6 points total)
- Analysis of results (updated to analyze all results): 10 points
Conclusion and future work:
- Conclusion summarizes the main takeaways from the experiments: 2 points
- Future work describes some interesting future directions: 2 points

Final Presentation

Prepare a presentation to present in front of the class that describes your completed project. Everyone in the group should present a portion of the presentation. Based on the class size, we will let you know how long the presentation should be. Please practice your talk and make sure it ends within the allotted time.

Final Presentations will be graded as follows (20 total points):

Presentation of environment and reward function: 3 points
Presentation of method: 4 points
Presentation of results: 3 points
Presentation of analysis of results: 4 points
Presentation of conclusions and future work: 3 points
Timed correctly: 3 points

FAQ

Q: Why not make the course project more flexible? The best way would be to integrate with our own research project.

Answer: We previously did course projects in the way that you suggested. The reason we changed it is as follows: Previously, some students would just submit their research as their course project without doing any additional work. Other students who were not doing RL research were severely disadvantaged; they would often propose an ambitious project but then spend most of the semester trying to get their RL environment set up, try a novel RL idea at the end, it wouldn't work, and then would leave feeling disappointed. Grading was unfair because students who were doing RL research and submitted their research as their course project would automatically get a higher grade. In order to equalize the playing field, we created a very lightweight course project. The goals of the project are for students to get some minimal hands-on experience with RL and discover how making some modifications affects the learning. This is very open-ended and can include many (though not all) research projects, but it is also not difficult for someone to ramp up to if they are not involved in RL research.

Google Sites

Report abuse