Composing Task-Agnostic Policies with Deep Reinforcement Learning

The composition of elementary behaviors to solve challenging transfer learning problems is one of the key elements in building intelligent machines. To date, there has been plenty of work on learning task-specific policies or skills but almost no focus on composing necessary, task-agnostic skills to find a solution to new problems. In this paper, we propose a novel deep reinforcement learning-based skill transfer and composition method that takes the agent's primitive policies to solve unseen tasks. We evaluate our method in difficult cases where training policy through standard reinforcement learning (RL) or even hierarchical RL is either not feasible or exhibits high sample complexity. We show that our method not only transfers skills to new problem settings but also solves the challenging environments requiring both task planning and motion control with high data efficiency.

HalfCheetah-Hurdle Environment: Composite policy composed jumping and running policies.

Ant Cross Maze: Composite policy composed left, right, up, and down motion policies.

Pusher: Composite policy combined push-to-bottom and push-to-left policies to reach bottom left target.

Ant Environments: Composite policy combined left, right, up, and down moving policies.

Ant Maze


Ant Push


Ant Fall