- Start with a diverse multi-task dataset
This can be a dataset of a wide range of different tasks. As shown in the figure, for each scene, we include trajectories of different agent behaviors in the dataset, e.g. grasp the cub, or place the bottle on the cube.
2. Train a behavioral prior using the dataset
The behavioral prior learns an invertible mapping that maps noise to useful action. This mapping is conditioned on the current observation, which we obtain by passing our observation, an RGB image, to a convolutional neural network.
3. Use behavioral prior to bootstrap exploration for new tasks
Instead of learning a policy that directly executes its actions in the original MDP, we learn a policy that outputs z which is taken by the behavioral prior as input. We then execute the output from the behavioral prior in the environment.
Learning with a Behavioral Prior
We visualize trajectories from executing a random policy, with and without the behavioral prior. We see that the behavioral prior substantially increases the likelihood of executing an action that is likely to lead to a meaningful interaction with an object, while still exploring a diverse set of actions.
Without Behavioral Prior
With Behavioral Prior
We evaluated our method on eight tasks (shown below). For each task, the positions of all objects in the scene is randomized at the start of every episode. We plotted the performance for each task in the following section.
Place Can in Pan
Place Sculpture in Basket
Place Chair on Checkerboard Table
Place Baseball Cap on Block
Pick up Bar
Pick up Sculpture
Pick up Cup
Pick up Baseball Cap
PARROT is able to learn much faster than prior methods on a majority of the tasks, and shows little variance across runs. Note that some methods that failed to make any progress on certain tasks (such as “Place Sculpture in Basket”) overlap each other with a success rate of zero.