Keep Doing What Worked:

Behavior Modelling Priors for Offline Reinforcement Learning

This is a companion website to the ICLR 2020 submission "Behavior Modelling Priors for Offline Reinforcement Learning" containing videos of the agents described in the paper performing control suite and robotics tasks.

The ABM + MPO agent performs block stacking learned via offline multitask learning, in simulation (left) and on the real robot (right).

In simulation we learn the new tasks "bring to center" (left) and "bring to corner" (right) fully offline using ABM + MPO. No policies in the training dataset were given these rewards.

Control suite evaluation of the BM prior (left), ABM prior (center), and ABM + MPO (right).