Explainable Visual Imitation Learning

Visual Imitation Learning with Patch Rewards

Minghuan Liu, Tairan He, Weinan Zhang, Shuicheng Yan, Zhongwen Xu

Shanghai Jiao Tong University, Sea AI Lab

Visualization of patch rewards by the proposed PatchAIL on seven DeepMind Control Suite tasks and four Atari tasks. The patch rewards are calculated regarding several stacked frames and are mapped back onto each pixel.

PatchAIL-Overall.mp4

Problem: How to learn to behave from visual demonstrations (videos, images) efficiently while maintaining good explainability?

Solution: Just use patch rewards with the AIL framework!

Paper

Code

Patch Discriminator

Discriminate the difference between agent patches and expert patches.

Overall Architecture

Agents are trained by patch reward, which is further regularized by the patch logits distribution.

Patch Regulation

We want to maximize the similarity sim(s,s') to the demonstrations for the observation pair (s,s').

This can serve as a weight multiplier or a bonus term.

Comparison:

(a) Shared-encoder AIL: The discriminator learns from latent samples that are learned from the critic.

(b) Independent-encoder AIL: The discriminators learn from images directly with a scalar label.

Results

The results on DeepMind Control Suite have demonstrated that PatchAIL outperforms baseline methods.

To understand where the discriminators focus, we visualize the spatial feature maps of different methods.

It is clear that PatchAIL provides complete and holistic concentration on the key elements of images.

Explain why better:
1. Discriminator should learn its own representation instead of sharing with the critic

2. Scalar rewards bring less informative signals for training discriminator

3. Patch-level rewards provide a fine-grained measure

More results:

The results on Atari games are consistent. (The first method learning from observation only without any action info on Atari!)

More results:

Combined with BC, even better (only 1 demo trajectory).

See more experiments and interesting results in our paper!

Page updated

Google Sites

Report abuse