C-Learning

Horizon-Aware Cumulative Accessibility Estimation

Paper

Code

Abstract

Multi-goal reaching is an important problem in reinforcement learning needed to achieve algorithmic generalization. Despite recent advances in this field, current algorithms suffer from three major challenges: high sample complexity, learning only a single way of reaching the goals, and difficulties in solving complex motion planning tasks. In order to address these limitations, we introduce the concept of cumulative accessibility functions, which measure the reachability of a goal from a given state within a specified horizon. We show that these functions obey a recurrence relation, which enables learning from offline interactions. We also prove that optimal cumulative accessibility functions are monotonic in the planning horizon. Additionally, our method can trade off speed and reliability in goal-reaching by suggesting multiple paths to a single goal depending on the provided horizon. We evaluate our approach on a set of multi-goal discrete and continuous control tasks. We show that our method outperforms state-of-the-art goal-reaching algorithms in success rate, sample complexity, and path optimality.

Additional Results

Below we show a video of C-learning's performance on the FetchPickAndPlace-v0 environment, which effectively learns to solve the task where a robotic arm needs to pick up a block and move it to the goal. Goals are defined as 3-dimensional coordinates. The space state is 25 dimensional, and the action space 4-dimensional. As mentioned in the manuscript, at the end of training, the success rate of C-learning is 99.6%, and for TD3 with HER it is 95.3%. Successful trajectories take on average 8.62 steps to reach the goal for C-learning, and 9.96 for HER.

We also compare C-learning against TD3 with HER in the HandManipulatePenFull-v0 environment. As mentioned in the paper, C-learning achieves a 39.3% success rate, while HER obtains only 19.7%; and out of the successful runs, C-learning takes on average 7.58 steps to the goal, while HER takes 13.9. We show some comparative examples in the video:

Here is a short description of our paper:

C-Learning_ Horizon-Aware Cumulative Accessibility Estimation (1).mp4