Policy Gradient Bayesian Robust Optimization for Imitation Learning

Zaynah Javed*, Daniel S. Brown*, Satvik Sharma, Jerry Zhu, Ashwin Balakrishna,

Marek Petrik, Anca D. Dragan, Ken Goldberg

International Conference on Machine Learning (ICML) 2021

TLDR: There are often many different reward functions that explain human demonstrations, leaving imitation learning agents with uncertainty over the demonstrator's true intent. To address this uncertainty, we derive a novel policy gradient-style robust imitation learning algorithm, PG-BROIL, that easily scales to continuous MDPs. PG-BROIL outperforms existing state-of-the-art imitation learning methods by hedging against uncertainty, rather than seeking to uniquely identify the demonstrator's reward function.

Page updated

Google Sites

Report abuse