Dynamical Imitation Policies with Efficient Out-of-Sample Recovery

ICLR 2025 🌴

Codebase, summary of results, and more!

Overview 🔭

We learn contractive policies from expert behavior for a range of robotics applications.

The contraction property enables efficient out-of-sample recovery, especially in the face of perturbations. By utilizing contractive policies, we extend beyond the typical convergence guarantees of stable dynamical systems, offering certificates on the transient behavior of induced trajectories in addition to global convergence guarantees. We achieve notable improvements in out-of-sample recovery for various robots in navigation and manipulation tasks.

Contractive vs. stable policies reacting to out-of-sample states.

GitHub

Paper and Reviews

Docker image

Policy Rollouts on Robots 🤖

jackal_demo_scds.mp4

franka_demo_scds.mp4

Architecture and Training Pipeline ⚙️

There are three main steps in learning contractive policies with SCDS:

Initial conditions are passed to the differentiable Neural ODE solver to generate state trajectories.
A tailor-made loss penalizes the discrepancy of the generate and expert trajectories, and updates the policy parameters.
Within the contractive policy, the REN module ensures contraction, the linear transformation adjusts the dimension of the latent space, and the bijection block boosts the policy's expressive power while preserving contraction properties.

Overview of SCDS training pipeline.

LASA Dataset Experiments

Starting from randomly sampled initial conditions, the contractive policy can generate rollouts to converge to expert.

Robomimic Dataset Experiments

Policies trained on Lift and Can task from the Robomimic dataset. Note how the contractive policy converges to an average behavior of multiple demonstrations when demonstrations are not contractive themselves.

Robotic Simulations in Isaac Lab 💻

After training on expert demonstrations, the policy can be deployed with a low-level controller. The contractivity and, in turn, global stability of the policy facilitates reliable execution and out-of-sample recovery.

Theoretically, our method can be deployed for planning in various robotics systems and scenarios. We explore such use case for manipulation and navigation on Franka Panda and Clearpath Jackal robots, respectively.

Citation 📚

@inproceedings{abyaneh2025contractive,title={Contractive Dynamical Imitation Policies for Efficient Out-of-Sample Recovery},author={Amin Abyaneh and Mahrokh Ghoddousi Boroujeni and Hsiu-Chin Lin and Giancarlo Ferrari-Trecate},booktitle={The Thirteenth International Conference on Learning Representations}, year={2025},url={https://openreview.net/forum?id=lILEtkWOXD}}

Page updated

Google Sites

Report abuse