We noticed that our Figure 2 is not renderable in certain PDF software due to its high resolution. We found that the Mac Preview Software works, but not Adobe. The following is the revised figure that is viewable. We will update the main paper figure when it becomes possible.
We also ablated dataset quality in our hardest task, Complex. We found that a clean demonstration dataset greatly boosts all approaches. Our method still does the best. See Appendix F.8 for details.
Below, we showcase videos of the policies in the 4 real world active perception experiments.
First, we show representative Pi0 behavior in such searching tasks. It does not efficiently search, and in many cases will never fully explore the scene. This results in very low success rates and very long search times.
AAWR
AWR
BC
We can see that although exploring the cabinet on the left in general is a difficult task, only AAWR showed consistent ability to explore the cabinet and success.
AAWR
AWR
BC
AAWR
AWR
BC
We can see that both AWR and BC tend to move away even when the pineapple is quite visible in the wrist camera, while AAWR learns to "lock on" to the target object. AAWR also searches the top and the right side (far way from initial pose) of the bookshelf more closely.
AAWR
AWR
BC
AAWR (offline)
AWR (offline)
AAWR (online)
AWR (online)
BC
We can see that BC is very noisy and has high variance, leading to low pick and pick rate. Offline AWR is more stable, but has lower accuracy and sometimes can't maintain a firm grasp. This is mitigated but not solved by the online training. Offline AAWR is stable and accurate, but also sometimes can't maintain its grasp. However, after the online training, this problem was gone, and additionally, the policy shows a reliable retrying behavior.
In this visually challenging task, the robot must pick up a tiny, hard to see marble from only an 84x84 RGB image.
Even in the fully observable case where privileged information may be redundant, AAWR outperforms non-privileged baselines like AWR and BC. We can see that only AAWR learns to precisely pick up the block, whereas AWR and BC struggle to place the gripper directly on top of the block.
Only AAWR is able to learn to scan the workspace, and is able to pick up blocks with near 100% success rate. Distillation and VIB learn suboptimal strategies of directly approaching the center of the workspace, which works in many cases but not when the block is initialized away from the center.