SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

Yecheng Jason Ma, Andrew Shen, Dinesh Jayaraman, Osbert Bastani

Overview

SMODICE is a simple and versatile offline imitation learning algorithm that supports offline IL with three distinct types of demonstrations: (i) expert observations (IfO), (ii) observations from mismatched experts, and (iii) examples of success states. Optimizing a state-occupancy matching objective, SMODICE admits a simple optimization procedure through an application of Fenchel duality, requiring no nested optimization. On a wide range of settings and tasks, SMODICE achieves state-of-art performance without any hyperparameter tuning.

What Demonstrations Does SMODICE Support?

Expert Observations

Mismatched Experts

Examples of Success States

Offline IL from Observations

Across a wide range of tasks and offline dataset compositions, SMODICE learns effective policies for all of them with no task-specific hyperparameter tuning.

Offline IL from Mismatched Experts

Mismatched Experts Visualization

PointMass

HalfCheetah-Short (random policy)

Ant-Disabled (random policy)

SMODICE vs. ORIL

Offline IL from Examples

Examples Visualization

Pointmass-4Direction

(Click to see dataset)

AntMaze

Microwave

Kettle

SMODICE vs. RCE (TD3+BC) vs. ORIL vs. BC

SMODICE

RCE

ORIL

BC

SMODICE

RCE

ORIL

BC

SMODICE

RCE

ORIL

BC

SMODICE

RCE

ORIL

BC

SMODICE is the only method that can solve all four tasks consistently using only examples supervision!