Real-World Reinforcement Learning of Active Perception Behaviors

Edward S. Hu*¹, Jie Wang*¹, Xingfang Yuan*¹, Fiona Luo1, Muyao(Lilian) Li¹, Gaspard Lambrechts², Oleh Rybkin³, Dinesh Jayaraman¹
¹ GRASP Lab, University of Pennsylvania ²University of Liège ³BAIR Lab, University of California, Berkeley
*Equal contribution.

Notice(11/11): This website is used for OpenReview video rollout demonstrations. We will release the full project with a new website soon.

Abstract

A robot's instantaneous sensory observations do not always reveal task-relevant state information. Under such partial observability, optimal behavior typically involves explicitly acting to gain the missing information. Today's standard robot learning techniques struggle to produce such active perception behaviors. We propose a simple real-world robot learning recipe to efficiently train active perception policies. Our approach, asymmetric advantage weighted regression (AAWR), exploits access to ``privileged'' extra sensors at training time. The privileged sensors enable training high-quality privileged value functions that aid in estimating the advantage of the target policy. Bootstrapping from a small number of potentially suboptimal demonstrations and an easy-to-obtain coarse policy initialization, AAWR quickly acquires active perception behaviors and boosts task performance. In evaluations on 8 manipulation tasks on 3 robots spanning varying degrees of partial observability, AAWR synthesizes reliable active perception behaviors that outperform all prior approaches. When initialized with a ``generalist'' robot policy that struggles with active perception tasks, AAWR efficiently generates information-gathering behaviors that allow it to operate under severe partial observability for manipulation tasks. Website: https://sites.google.com/view/rwrl-ap/home

Robot Experiments

We evaluate AAWR in 8 different manipulation tasks in sim and real.

Data Ablation experiment

We also ablated dataset quality in our hardest task, Complex. We found that a clean demonstration dataset greatly boosts all approaches. Our method still does the best. See Appendix F.8 for details.

Pi0 Active Perception Experiments

Below, we showcase videos of the policies in the 4 real world active perception experiments.

Pi0 is inefficient at finding hidden objects

vert_pi0_f.mp4

First, we show representative Pi0 behavior in such searching tasks. It does not efficiently search, and in many cases will never fully explore the scene. This results in very low success rates and very long search times.

Complex task Videos

AAWR

hard_aawr_s1.mp4

AWR

hard_awr_s1_rect.mp4

hard_bc_s1_rect.mp4

hard_aawr_s2.mp4

hard_awr_f1_rect.mp4

hard_bc_f1_rect.mp4

hard_aawr_f1.mp4

hard_awr_f2_rect.mp4

hard_bc_f2_rect.mp4

hard_vlm_f.mp4

VLM + Pi0

hard_oplp_f.mp4

Exhaustive Search

We can see that although exploring the cabinet on the left in general is a difficult task, only AAWR showed consistent ability to explore the cabinet and success.

Cabinet_Shelf Videos

AAWR

dual_aawr_s1.mp4

AWR

dual_awr_s1.mp4

dual_bc_s1.mp4

dual_aawr_s2.mp4

dual_awr_f2.mp4

dual_bc_f1.mp4

dual_aawr_f1.mp4

dual_awr_f3.mp4

dual_bc_f2.mp4

dual_vlm_f.mp4

VLM + Pi0

dual_oplp_s.mp4

Exhaustive Search

Bookshelf-P Videos

AAWR

vert_aawr_s1_rect.mp4

AWR

vert_awr_s1_rect.mp4

vert_bc_s1_rect.mp4

vert_aawr_s2_rect.mp4

vert_awr_f1_rect.mp4

vert_bc_f1_rect.mp4

vert_aawr_f1_rect.mp4

vert_awr_f2_rect.mp4

vert_bc_f2_rect.mp4

vert_vlm_f.mp4

VLM + Pi0

vert_oplp_s.mp4

Exhaustive Search

We can see that both AWR and BC tend to move away even when the pineapple is quite visible in the wrist camera, while AAWR learns to "lock on" to the target object. AAWR also searches the top and the right side (far way from initial pose) of the bookshelf more closely.

Bookshelf-D Videos

AAWR

duck_aawr_s1.mp4

AWR

duck_awr_s1.mp4

duck_bc_s1.mp4

duck_aawr_s2.mp4

duck_awr_f1.mp4

duck_bc_f1.mp4

duck_aawr_f1.mp4

duck_awr_f2.mp4

duck_bc_f2.mp4

duck_vlm_f.mp4

VLM + Pi0

duck_oplp_f.mp4

Exhaustive Search

Koch Interactive Perception Experiments

AAWR (offline)

AWR (offline)

20250413_113550.mp4

poking.mp4

AAWR (online)

AWR (online)

succ.mp4

noisy_succ.mp4

retry_2.mp4

holding.mp4

noisy_miss.mp4

We can see that BC is very noisy and has high variance, leading to low pick and pick rate. Offline AWR is more stable, but has lower accuracy and sometimes can't maintain a firm grasp. This is mitigated but not solved by the online training. Offline AAWR is stable and accurate, but also sometimes can't maintain its grasp. However, after the online training, this problem was gone, and additionally, the policy shows a reliable retrying behavior.

Simulated Experiments

Camouflage Pick

AAWR

aawr_succ_camopick.mp4

AWR

awr_fail_camopick.mp4

BC

bc_fail_camopick.mp4

In this visually challenging task, the robot must pick up a tiny, hard to see marble from only an 84x84 RGB image.

Fully Observed Pick

AAWR

aawr_succ_fopick.mp4

AWR

awr_fail_fopick.mp4

BC

bc_fail_fopick.mp4

Even in the fully observable case where privileged information may be redundant, AAWR outperforms non-privileged baselines like AWR and BC. We can see that only AAWR learns to precisely pick up the block, whereas AWR and BC struggle to place the gripper directly on top of the block.

Active Perception Koch

AAWR

koch_wrist_awr_converged.mp4

Distillation

koch_distillation_eval.mp4

VIB

koch_estimator_failure.mp4

Only AAWR is able to learn to scan the workspace, and is able to pick up blocks with near 100% success rate. Distillation and VIB learn suboptimal strategies of directly approaching the center of the workspace, which works in many cases but not when the block is initialized away from the center.

Page updated

Google Sites

Report abuse