SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

Overview

SMODICE is a simple and versatile offline imitation learning algorithm that supports offline IL with three distinct types of demonstrations: (i) expert observations (IfO), (ii) observations from mismatched experts, and (iii) examples of success states. Optimizing a state-occupancy matching objective, SMODICE admits a simple optimization procedure through an application of Fenchel duality, requiring no nested optimization. On a wide range of settings and tasks, SMODICE achieves state-of-art performance without any hyperparameter tuning.

What Demonstrations Does SMODICE Support?

Expert Observations

Mismatched Experts

Examples of Success States

Offline IL from Observations

Across a wide range of tasks and offline dataset compositions, SMODICE learns effective policies for all of them with no task-specific hyperparameter tuning.