Batch Exploration with Examples (BEE)
We propose batch exploration with examples (BEE), an exploration technique that explores relevant regions of the state-space guided by weak human supervision.
Motivating question: How can we collect large amounts of meaningful data for robotic manipulation without depending on a human in the loop?
Our batch exploration + batch reinforcement learning framework allows for a scalable, data-driven approach to robotic learning. In particular, in the batch exploration phase, the first step of collecting a modest number of example relevant states can be completed in a few minutes, and the second step can be run with the robot completely unsupervised.
How does online exploration with BEE work?
BEE learns an ensemble of relevance discriminators, which measure how relevant a new state is.
BEE learns a latent dynamics model, which it uses to plan its exploratory actions.
BEE aims to explore around states it estimates are relevant, or has high uncertainty in. To do so, BEE performs model predictive control using the latent dynamics model where the reward function is the max score over the ensemble of discriminators.
Does BEE interact more with relevant objects?
Across 6 simulated settings, BEE interacts with the relevant object more than twice as often as the comparisons.
BEE can even be used to explore multiple objects simultaneously.
Does data from BEE enable better downstream performance?
Using data from BEE for downstream goal-reaching tasks, BEE improves performance on 4 out of 5 tasks.
Is BEE effective on a real robot?
BEE interacts with the drawer an order of magnitude more than Disagreement.
Using the data collected from BEE yields 20% improvement in the downstream task of closing an open drawer