Your grade for the class project will be subdivided as follows:
Project proposal: 20%
Midterm Report: 30%
Final Report: 50%
Feb 11 at 11:59 PM ET submit your project proposal along with a list of your team members
March 25 at 11:59 PM ET submit a midterm report
April 28 at 11:59 PM ET submit your final report of your project
There are no late days for the project. Any project component submitted late will get a score of 0.
Only one submission per team is needed for each project deliverable.
All project deliverables must be submitted on Gradescope.
You are recommended to start forming a team as soon as possible. The project must be in a group of size 2-3. Groups of 1 are not allowed; groups of 4+ are also not allowed.
If you do not know anyone else in the class, turn to your neighbors before or after class and introduce yourself!
We will create a Piazza post to help people find teammates.
Once you have a team, please register it on Canvas under the People -> Groups Section.
We have enabled Self-Sign-Up on Canvas, which allows you to form a team and choose your teammates.
Please follow these steps to get registered:
On Canvas, click on the People link in the left-hand navigation menu.
Click the Groups tab at the top of the page.
You will see a list of available groups under "Course Project Teams".
Click Join.
Click the + Group button on the right side of the Groups tab.
Group Size: Groups are limited to a maximum of 3 members.
Coordination: If you are looking for a partner, feel free to use the Group-finding post on Piazza.
The class project does NOT need to be about robotics, as long as you meet the requirements described below.
The goal of the class project is to make sure that you can apply deep reinforcement learning to new problems that you might encounter after the class. With this goal in mind, the project is structured as follows:
Pick an RL environment for your project. You can use any environment that you choose! You can reuse an existing environment or create a new one. Here are some example environments that you can use to train your RL agent:
Standard gymnasium environments: Ant, Half Cheetah, etc
Robotics environments: mostly manipulation based - Fetch, Shadow Hand, Adroit, Franka Kitchen, etc
Third party environments: A long list of environments compatible with Gymnasium, on the topics of Autonomous Vehicle and traffic management, Biological / Medical environments, Economic / Financial environments, Electrical / Energy environments, Robotics environments, Telecommunication Systems environments, and more.
Your life will be much easier if the environment follows the Gymnasium interface (you will see that most RL environments that you find online use this interface)- see details here: https://gymnasium.farama.org/
Set up a reward function (you can reuse an existing reward function or create a new one to define a new task)
Show the performance of a random agent in this environment (e.g. an agent that takes random actions)
Train 3 different RL methods in this environment
On-policy Policy-gradient RL
Off-policy Q-function-based RL
Model-based RL
To clarify, you are allowed to use existing RL libraries for your project. The idea of the project is to help you get used to how you might use RL for an application after the class is complete. This type of learning is meant to complement the learning that you do for the homeworks in which you will actually implement (parts of) RL algorithms yourself. You are not required to use any existing RL libraries; if you prefer, you can implement the algorithms yourself from scratch, or to reuse code from the homeworks. If you choose to use RL libraries, here are a few that you might find helpful:
Stable-Baselines3 (SB3), reliable PyTorch implementations with a consistent interface (PPO, A2C, DQN, SAC, TD3, etc.)
FAQ:
Question: Can we do an open-ended project based on my research or based on my particular interests that don't follow the above format?
Answer: Unfortunately, we cannot allow completely open-ended projects, due to:
The need for TAs to grade a large number of projects quickly, which is easier when there is a fixed project format and an accompanying fixed grading rubric
The need for TAs to grade the projects fairly, which is difficult when some projects are open-ended and others are not
The pedagogical aim of the project to help you practice material that you learn in the course.
At the same time, you can probably fit your interests into the project, e.g. take your environment of interest and try using policy gradients, Q-learning, and a model-based approach. It's OK if some of the methods don't work well, as long as you have a reasonable analysis.
Next, you should propose 2 modifications. These modifications can be as big as proposing a new RL algorithm that you want to create, or it can be a smaller change. Here are some examples of modifications (though you should not feel constrained by this list - these are just examples):
Change the input domain or add another sensor (e.g. state, RGB images, depth images, point clouds, tactile sensing, force feedback, etc)
Change the network architecture
Change the action space (e.g. end-effector position control, end-effector force control, joint torques, joint angles, high-level action primitives, different gripper type, etc)
Change the reward function (e.g. add intermediate rewards to guide the agent to achieve the task)
Add an auxiliary loss function
Changing the environment in some way (e.g. adding more obstacles)
Compare different RL algorithms (for example, try a few different types of model-based RL algorithms and compare their performance)
Modify an existing RL algorithm in some way
Varying algorithm hyper-parameters
Create a new RL algorithm
Feel free to discuss your project ideas with a TA or instructor.
The instructions for each of these components are as follows:
Download the latest CoRL LaTeX template and use it for your project writeups. Add your title, names of authors, affiliation and abstract according to CoRL guidelines. Make sure that the authors are not listed as anonymous so that we can grade your submission!
Your proposal will be evaluated as follows (20 total points):
Format: CoRL template (1 point)
Page limit: 4 pages + references + appendix (you do not have to fill the 4 pages)
Environment (2 points): Describe the environment that you will use for your experiments.
Reward function (1 points): Describe your reward function. You can change this later on in the course if you like.
Method: Describe the 2 modifications that you are going to try. You can change these later on in the course if you like. Points are as follows:
Proposed modification #1 is clearly explained (4 points)
Proposed modification #2 is clearly explained (4 points)
Results:
Includes a plot of the performance of a random agent (e.g. an agent that takes random actions) (4 points)
Plots should report mean return versus training environment steps
Link to a website showing videos of the performance of the random agent (e.g. an agent that takes random actions) (4 points)
For the midterm report, you should have completed the experiments on "On-policy Policy-gradient RL" and "Off-policy Q-function-based RL", but you do not need to have yet the experiments on "Model-based RL" or your two modifications.
Midterm reports will be evaluated as follows (30 total points) - new sections are bolded in red. You are welcome to change any of the sections from the project proposal if you wish.
Report follows the CoRL template (1 point)
Page limit: 6 pages + references + appendix (you do not have to fill the 6 pages)
Environment (1 points): Describe the environment that you will use for your experiments.
Reward function (1 points): Describe your reward function. You can change this later on in the course if you like.
Method: Describe the 2 modifications that you are going to try. You can change these later on in the course if you like. Points are as follows:
Proposed modification #1 is clearly explained. (3 points)
Proposed modification #2 is clearly explained. (3 points)
Results (Note that the performance of the different methods should be on the same figure):
Includes a plot of the performance of a random agent (e.g. an agent that takes random actions) (3 points)
Includes a plot of the experiment for "On-policy Policy-gradient RL" (overlaid on the same figure as above) (3 points)
Includes a plot of the experiment for "Off-policy Q-function-based RL" (overlaid on the same figure as above) (3 points)
Plots should report mean return versus training environment steps
The methods do not have to "work" to get full credit, as long as you have a reasonable analysis
Analysis of current results: 9 points, broken down as follows:
Analysis of learning curves (2)
Analysis of success rates or other task-relevant metrics (2)
Analysis of failure cases (3)
Clear next-step plan (2)
Link to a website showing videos of the performance of the different agents (updated to include new agents) (3 points)
As before, please use the CoRL template. You should have all experiments complete.
Final reports will be evaluated as follows (total 50 points) - new sections are bolded in red. You are welcome to change any of the sections from the midterm report if you wish.
Report follows the CoRL template (1 point)
Page limit: 8 pages + references + appendix (you do not have to fill the 8 pages)
Environment (1 points): Describe the environment that you will use for your experiments.
Reward function (1 points): Describe your reward function.
Method
Proposed modification #1 is clearly explained. (2 points)
Proposed modification #2 is clearly explained. (2 points)
Results (Note that the performance of the different methods should be on the same figure):
Includes a plot of the performance of a random agent (e.g. an agent that takes random actions) (2 points)
Includes a plot of the experiment for "On-policy Policy-gradient RL" (overlaid on the same figure as above) (2 points)
Includes a plot experiment for "Off-policy Q-function-based RL" (overlaid on the same figure as above) (3 points)
Includes a plot of the experiment for "Model-based RL" (overlaid on the same figure as above) (4 points)
Includes a plot of the experiment for "Modification 1" (overlaid on the same figure as above) (4 points)
Includes a plot of the experiment for "Modification 2" (overlaid on the same figure as above) (4 points)
Plots should report mean return versus training environment steps
The methods do not have to "work" to get full credit, as long as you have a reasonable analysis
Analysis of results (updated to analyze all results): 15 points, broken down as follows:
Analysis of all learning curves (2)
Analysis of success rates or other task-relevant metrics (2)
Analysis of failure cases (3)
Comparison of performance across all methods and analysis (4)
Comparison of performance across the two modifications and analysis (4)
Link to a website showing videos of the performance of the different agents (updated to include all new agents) (3 points)
Conclusion and future work:
Conclusion summarizes the main takeaways from the experiments: 3 points
Future work describes some interesting future directions: 3 points