Blindfolded Experts Generalize Better

Insights from Robotic Manipulation and Videogames

Abstract
Behavioral Cloning (BC) is a simple yet effective technique for learning sequential decision-making from demonstrations. Recently, it has gained prominence as the core of foundation models for the physical world, where achieving generalization requires countless demonstrations of a multitude of tasks. Typically, a human expert with full information on the task demonstrates a (nearly) optimal behavior. In this paper, we propose to hide some of the task's information from the demonstrator. This ``blindfolded'' expert is compelled to employ non-trivial exploration to solve the task. We show that cloning the blindfolded expert generalizes better to unseen tasks than its fully-informed counterpart. We conduct experiments of real-world robot peg insertion tasks with (limited) human demonstrations, alongside a videogame from the Procgen benchmark. Additionally, we support our findings with theoretical analysis, which confirms that the generalization error scales with the square root of (I/m), where I measures the amount of task information available to the demonstrator, and m is the number of demonstrated tasks. Both theory and practice indicate that cloning blindfolded experts generalizes better with fewer demonstrated tasks.

Videos

Procgen Maze

Cloning the blindfolded expert leads to more exploratory behavior that generalizes better to test levels.

Note that even for failure cases, cloning the blindfolded expert still explores the maze.

Expert BC (Train)

Success - direct path

ppo_seed_10_prevseed_10_35_train.mp4

Expert BC (Train)

Success - direct path

ppo_seed_76_prevseed_76_30_train.mp4

Expert BC (Test)

Failure

ppo_seed_1515946232_prevseed_1515946232_500_test.mp4

Expert BC (Test)

Failure - limited exploration

ppo_seed_665724506_prevseed_665724506_500_test.mp4

Blindfolded Expert BC (Train)

Success - exploratory path

blindfolded_seed_10_prevseed_10_62_train.mp4

Blindfolded Expert BC (Train)

Success - exploratory path

blindfolded_seed_76_prevseed_76_106_train.mp4

Blindfolded Expert BC (Test)

Success

blindfolded_seed_1056019863_prevseed_1515946232_106_test.mp4

Blindfolded Expert BC (Test)

Failure - but still explors

blindfolded_seed_1393719341_prevseed_665724506_500_test.mp4

Experts' observation

Examples of the expert's views during demonstrations:
The full observations (Expert) vs. the masked observations (Blindfolded Expert). Note that the blindfolded expert exhibits a more exploratory behavior in order to solve the game.

Expert

Screencast from 2025-05-25 12-06-53.webm

Expert

Screencast from 2025-05-25 12-04-44.webm

Blindfolded Expert

Screencast from 2025-05-25 11-57-28.webm

Blindfolded Expert

Screencast from 2025-05-25 11-54-49.webm

Peg Insertion

Expert BC (Train)

The robot learns task-dependent behavior for training:
Aligning the shapes before inserting.

C0053.MP4

Blindfolded Expert BC (Train)

The robot learns a general behavior for training:
Exploring the domain and searching for the insertion angle and position.

C0045.MP4

Expert BC (Test)

Fails to align the shape when handling previously unseen pegs (test).

C0043.MP4

Blindfolded Expert BC (Test)

Explores the domain and finds the insertion pose of all test shapes (test).

C0033.MP4

Experts' observation

Examples of the expert's views during demonstrations:
The full observations (Expert) vs. the masked observations (Blindfolded Expert).
Each observation comprises a frame from wrist camera 1 (left) and wrist camera 2 (right).

Expert

wrist 1 wrist 2

2025-05-25_16-13-07_failed.mp4

Blindfolded Expert

wrist 1 wrist 2

2025-05-25_16-17-24_failed-SEG.mp4

Results

Cloning blindfolded experts achieves better generalization compared with the cloning of standard experts in all tasks.

Procgen maze

100 Training seeds (levels)

Peg insertion

Page updated

Google Sites

Report abuse