Abstract
Behavioral Cloning (BC) is a simple yet effective technique for learning sequential decision-making from demonstrations. Recently, it has gained prominence as the core of foundation models for the physical world, where achieving generalization requires countless demonstrations of a multitude of tasks. Typically, a human expert with full information on the task demonstrates a (nearly) optimal behavior. In this paper, we propose to hide some of the task's information from the demonstrator. This ``blindfolded'' expert is compelled to employ non-trivial exploration to solve the task. We show that cloning the blindfolded expert generalizes better to unseen tasks than its fully-informed counterpart. We conduct experiments of real-world robot peg insertion tasks with (limited) human demonstrations, alongside a videogame from the Procgen benchmark. Additionally, we support our findings with theoretical analysis, which confirms that the generalization error scales with the square root of (I/m), where I measures the amount of task information available to the demonstrator, and m is the number of demonstrated tasks. Both theory and practice indicate that cloning blindfolded experts generalizes better with fewer demonstrated tasks.
Cloning the blindfolded expert leads to more exploratory behavior that generalizes better to test levels.
Note that even for failure cases, cloning the blindfolded expert still explores the maze.
Success - direct path
Success - direct path
Failure
Failure - limited exploration
Success - exploratory path
Success - exploratory path
Success
Failure - but still explors
Examples of the expert's views during demonstrations:
The full observations (Expert) vs. the masked observations (Blindfolded Expert). Note that the blindfolded expert exhibits a more exploratory behavior in order to solve the game.
The robot learns task-dependent behavior for training:
Aligning the shapes before inserting.
The robot learns a general behavior for training:
Exploring the domain and searching for the insertion angle and position.
Fails to align the shape when handling previously unseen pegs (test).
Explores the domain and finds the insertion pose of all test shapes (test).
Examples of the expert's views during demonstrations:
The full observations (Expert) vs. the masked observations (Blindfolded Expert).
Each observation comprises a frame from wrist camera 1 (left) and wrist camera 2 (right).
wrist 1 wrist 2
wrist 1 wrist 2
Cloning blindfolded experts achieves better generalization compared with the cloning of standard experts in all tasks.
100 Training seeds (levels)