Causal Confusion and Reward Misidentification in Preference-Based Reward Learning
Jeremy Tien, Jerry Zhi-Yang He, Zackory Erickson, Anca D. Dragan, Daniel S. Brown
@ ICLR 2023
Videos and Supplemental Results
Evidence of Causal Confusion
We compare policies optimized with a reward learned from preferences (PREF) against policies optimized with the true reward (GT). State features on which preferences are based are fully-observable. Reward functions were trained with 52326 unique pairwise preferences. Both PREF and GT are optimized with 1M RL iterations and averaged over 3 seeds. Despite high pairwise preference classification test accuracy, the policy performance achieved by PREF under the true reward is very low compared with GT. However, the reward learned from preferences consistently prefers PREF over GT. This suggests that preference-based reward learning fails to learn a good reward for each of these tasks.
Reacher - GT
Reacher - PREF
Feeding - GT
Feeding - PREF
Itch Scratching - GT
Itch Scratching - PREF
Code and Data for Preference Learning Benchmarks
Reacher, Half Cheetah, and Lunar Lander: https://github.com/jeremy29tien/gym
Feeding and Itch Scratching: https://github.com/jeremy29tien/assistive-gym