SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching
Yecheng Jason Ma, Andrew Shen, Dinesh Jayaraman, Osbert Bastani
Overview
SMODICE is a simple and versatile offline imitation learning algorithm that supports offline IL with three distinct types of demonstrations: (i) expert observations (IfO), (ii) observations from mismatched experts, and (iii) examples of success states. Optimizing a state-occupancy matching objective, SMODICE admits a simple optimization procedure through an application of Fenchel duality, requiring no nested optimization. On a wide range of settings and tasks, SMODICE achieves state-of-art performance without any hyperparameter tuning.
What Demonstrations Does SMODICE Support?
Expert Observations
Mismatched Experts
Examples of Success States
Offline IL from Observations
Across a wide range of tasks and offline dataset compositions, SMODICE learns effective policies for all of them with no task-specific hyperparameter tuning.
Offline IL from Mismatched Experts
Mismatched Experts Visualization
PointMass
HalfCheetah-Short (random policy)
Ant-Disabled (random policy)
SMODICE vs. ORIL
Offline IL from Examples
Examples Visualization
Pointmass-4Direction
(Click to see dataset)
AntMaze
Microwave
Kettle
SMODICE vs. RCE (TD3+BC) vs. ORIL vs. BC
SMODICE
RCE
ORIL
BC
SMODICE
RCE
ORIL
BC
SMODICE
RCE
ORIL
BC
SMODICE
RCE
ORIL
BC
SMODICE is the only method that can solve all four tasks consistently using only examples supervision!