OpenAI DALL-E generated image for "humanoids performing imitation learning with memory-consistent neural networks".

Memory-Consistent Neural Networks for Imitation Learning

TLDR: A semi-parametric model class that provably improves behavior cloning with any underlying neural network architecture.

TLDR (hype): the RAG + Behavior Cloning paper you were looking for... and the SOTA in imitation learning! :)

Abstract

Imitation learning considerably simplifies policy synthesis compared to alternative approaches by exploiting access to expert demonstrations. For such imitation policies, errors away from the training samples are particularly critical. Even rare slip-ups in the policy action outputs can compound quickly over time, since they lead to unfamiliar future states where the policy is still more likely to err, eventually causing task failures. We revisit simple supervised “behavior cloning” for conveniently training the policy from nothing more than pre-recorded demonstrations, but carefully design the model class to counter the compounding error phenomenon. Our “memory-consistent neural network” (MCNN) outputs are hard-constrained to stay within clearly specified permissible regions anchored to prototypical “memory” training samples. We provide a guaranteed upper bound for the sub-optimality gap induced by MCNN policies. Using MCNNs on 9 imitation learning tasks, with MLP, Transformer, and Diffusion backbones, spanning dexterous robotic manipulation and driving, proprioceptive inputs and visual inputs, and varying sizes and types of demonstration data, we find large and consistent gains in performance, validating that MCNNs are better-suited than vanilla deep neural networks for imitation learning applications

MCNN is a simple plug-in approach to improve behavior cloning with any neural network 

MCNN + X  >>  X 

where X can be {MLP, Behavior Transformer, Diffusion Policy}

What does MCNN guarantee, unlike vanilla neural networks?

A bound on the sub-optimality gap between learned policy and expert. 

Improved performance by simply adding more memories*.

MCNN works particularly well on challenging realistic demonstration datasets!

Videos of MCNN-Diffusion agents performing dexterous manipulation tasks

mcnn_diffusion_pen_10episodes_combined.mov

pen

mcnn_diffusion_relocate_10episodes_combined.mov

relocate

mcnn_diffusion_door_10episodes_combined.mov

door

mcnn_diffusion_hammer_10episodes_combined.mov

hammer

Videos of MCNN-MLP agents performing dexterous manipulation tasks

mcnn_mlp_pen.mov
mcnn_mlp_relocate.mov
mlp_mcnn_door.mov
mcnn_mlp_hammer.mov

Videos of MCNN-Diffusion agents in Franka Kitchen Environment

mcnn_diffusion_does_kettle_topburner_light_slide_(wow)hinge.mp4

5 tasks: kettle, top-burner, light, slide, hinge

mcnn_diffusion_does_kettle_lowerburner_topburner_slide_(wow)hinge.mp4

5 tasks: kettle, bottom-burner, top-burner, slide, hinge

mcnn_diffusion_does_microwave_kettle_light_hinge.mp4

4 tasks: microwave, kettle, light, hinge

mcnn_diffusion_does_kettle_lowerburner_light_slide_(almost)hinge.mp4

5 tasks: kettle, bottom-burner, light, slide, (and almost) hinge

MCNN methods outperform all baselines on Dexterous Manipulation (Adroit), Driving (CARLA), and Multi-Stage Manipulation (Franka Kitchen)

With Human Demonstrations  [25 demos, ~5000 transitions] :-

With RL Expert Demonstrations  [5000 demos, ~1 million transitions] :-

In CARLA with Expert Demonstrations [400 demos, ~100K transitions] :-

In FrankaKitchen with Human Demonstrations [566 demos, ~130K transitions] :-

Citation

@misc{sridhar2023memoryconsistent,

      title={Memory-Consistent Neural Networks for Imitation Learning}, 

      author={Kaustubh Sridhar and Souradeep Dutta and Dinesh Jayaraman and James Weimer and Insup Lee},

      year={2023},

      eprint={2310.06171},

      archivePrefix={arXiv},

      primaryClass={cs.LG}

}

Back-to-back Visualization of 25 Human Demonstrations in the Adroit Human Tasks and Discussion of Multimodality

hammer-v1_25.mp4

In hammer, some demonstrations move the hammer above the bolt before hitting it, some move it below, and some directly hit the bolt.

relocate-v1_25.mp4

In relocate, some demonstrations do through the green goal sphere and others go above it.

pen-v1_25.mp4

In pen, some demonstrations have the little finger (and other fingers) above the pen and some have it below the pen during the demo.

door-v1_25.mp4

Door is the only adroit env where multimodality isn't clearly visible.

Related Projects

Guaranteed Conformance of Neurosymbolic Models to Natural Constraints

arXiv | Video | Code | Tweet thread