Abstract

One approach to efficiently acquiring many skills efficiently is via Multi-task Reinforcement Learning, typically by training a single policy to solve all tasks at once. In this work, we investigate the feasibility of instead training separate policies for one task at a time, and only transferring from a task once the policy for it has finished training. We describe a method of finding near optimal sequences of transfers to perform in this setting, and use it to show that performing the optimal sequence of transfer is competitive with other multi-task RL methods on the MetaWorld MT10 benchmark.

Page updated

Google Sites

Report abuse