Below we show a video of C-learning's performance on the FetchPickAndPlace-v0 environment, which effectively learns to solve the task where a robotic arm needs to pick up a block and move it to the goal. Goals are defined as 3-dimensional coordinates. The space state is 25 dimensional, and the action space 4-dimensional. As mentioned in the manuscript, at the end of training, the success rate of C-learning is 99.6%, and for TD3 with HER it is 95.3%. Successful trajectories take on average 8.62 steps to reach the goal for C-learning, and 9.96 for HER.
We also compare C-learning against TD3 with HER in the HandManipulatePenFull-v0 environment. As mentioned in the paper, C-learning achieves a 39.3% success rate, while HER obtains only 19.7%; and out of the successful runs, C-learning takes on average 7.58 steps to the goal, while HER takes 13.9. We show some comparative examples in the video:
Here is a short description of our paper: