SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching
Yecheng Jason Ma, Andrew Shen, Dinesh Jayaraman, Osbert Bastani
Yecheng Jason Ma, Andrew Shen, Dinesh Jayaraman, Osbert Bastani
SMODICE is a simple and versatile offline imitation learning algorithm that supports offline IL with three distinct types of demonstrations: (i) expert observations (IfO), (ii) observations from mismatched experts, and (iii) examples of success states. Optimizing a state-occupancy matching objective, SMODICE admits a simple optimization procedure through an application of Fenchel duality, requiring no nested optimization. On a wide range of settings and tasks, SMODICE achieves state-of-art performance without any hyperparameter tuning.
Across a wide range of tasks and offline dataset compositions, SMODICE learns effective policies for all of them with no task-specific hyperparameter tuning.
Pointmass-4Direction
(Click to see dataset)
AntMaze
Microwave
Kettle
SMODICE
RCE
ORIL
BC
SMODICE
RCE
ORIL
BC
SMODICE
RCE
ORIL
BC
SMODICE
RCE
ORIL
BC
SMODICE is the only method that can solve all four tasks consistently using only examples supervision!