Solving Compositional Reinforcement Learning Problems via Task Reduction
We propose a novel learning paradigm, Self-Imitation via Reduction (SIR), for solving compositional reinforcement learning problems. SIR is based on two core ideas: task reduction and self-imitation. Task reduction tackles a hard-to-solve task by actively reducing it to an easier task whose solution is known by the RL agent. Once the original hard task is successfully solved by task reduction, the agent naturally obtains a self-generated solution trajectory to imitate. By continuously collecting and imitating such demonstrations, the agent is able to progressively expand the solved subspace in the entire task space. Experiment results show that SIR can significantly accelerate and improve learning on a variety of challenging sparse-reward continuous-control problems with compositional structures.
Videos of SIR agents
The robot hand is tasked to push the blue cubic box to the red goal position, but the brown elongated box blocks the door. Our agent can perturb the elongated box before pushing the cubic box towards the goal.
The gripper aims to manipulate the boxes on the table so that the box with the same color as the goal spot can reach the goal position with gripper fingers left apart. The agent successfully accomplishes the tasks by stacking non-target boxes below the goal spot.
The green particle agent is tasked to push the blue cubic box to the red goal position but the doors are all blocked by elongated boxes. The learned agent can strategically propel itself to clear the doors and push cubic box towards the goal.