Causal Confusion and Reward Misidentification in Preference-Based Reward Learning

Jeremy Tien, Jerry Zhi-Yang He, Zackory Erickson, Anca D. Dragan, Daniel S. Brown

@ ICLR 2023

Paper | Code

Videos and Supplemental Results

Evidence of Causal Confusion

We compare policies optimized with a reward learned from preferences (PREF) against policies optimized with the true reward (GT). State features on which preferences are based are fully-observable. Reward functions were trained with 52326 unique pairwise preferences. Both PREF and GT are optimized with 1M RL iterations and averaged over 3 seeds. Despite high pairwise preference classification test accuracy, the policy performance achieved by PREF under the true reward is very low compared with GT. However, the reward learned from preferences consistently prefers PREF over GT. This suggests that preference-based reward learning fails to learn a good reward for each of these tasks.

reacher_sac4096.mov

Reacher - GT

reacher_spinning.mov

Reacher - PREF

feedingsawyer_cropped.mov

Feeding - GT

feeding_behindhead.mov

Feeding - PREF

scratchitchjaco_cropped.mov

Itch Scratching - GT

scratchitch_flail.mov

Itch Scratching - PREF

Code and Data for Preference Learning Benchmarks

Reacher, Half Cheetah, and Lunar Lander: https://github.com/jeremy29tien/gym

Feeding and Itch Scratching: https://github.com/jeremy29tien/assistive-gym