Policy Ensemble Composition

Policy Ensemble Composition

Composing Task-Agnostic Policies with Deep Reinforcement Learning

Ahmed H. Qureshi, Jacob J. Johnson, Yuzhe Qin, Taylor Henderson, Byron Boots, Michael C. Yip

The composition of elementary behaviors to solve challenging transfer learning problems is one of the key elements in building intelligent machines. To date, there has been plenty of work on learning task-specific policies or skills but almost no focus on composing necessary, task-agnostic skills to find a solution to new problems. In this paper, we propose a novel deep reinforcement learning-based skill transfer and composition method that takes the agent's primitive policies to solve unseen tasks. We evaluate our method in difficult cases where training policy through standard reinforcement learning (RL) or even hierarchical RL is either not feasible or exhibits high sample complexity. We show that our method not only transfers skills to new problem settings but also solves the challenging environments requiring both task planning and motion control with high data efficiency.

[arXiv][Github] [ICLR 2020 openreview]

HalfCheetah-Hurdle Environment: Composite policy composed jumping and running policies.

Ant Cross Maze: Composite policy composed left, right, up, and down motion policies.

Pusher: Composite policy combined push-to-bottom and push-to-left policies to reach bottom left target.

Ant Environments: Composite policy combined left, right, up, and down moving policies.

Ant Maze

Ant Push

Ant Fall

Bibliography

@inproceedings{

Qureshi2020Composing,

title={Composing Task-Agnostic Policies with Deep Reinforcement Learning},

author={Ahmed H. Qureshi and Jacob J. Johnson and Yuzhe Qin and Taylor Henderson and Byron Boots and Michael C. Yip},

booktitle={International Conference on Learning Representations},

year={2020},

url={https://openreview.net/forum?id=H1ezFREtwH}

Google Sites

Report abuse