Explainable Visual Imitation Learning
Visual Imitation Learning with Patch Rewards
Minghuan Liu, Tairan He, Weinan Zhang, Shuicheng Yan, Zhongwen Xu
Shanghai Jiao Tong University, Sea AI Lab
Minghuan Liu, Tairan He, Weinan Zhang, Shuicheng Yan, Zhongwen Xu
Shanghai Jiao Tong University, Sea AI Lab
Visualization of patch rewards by the proposed PatchAIL on seven DeepMind Control Suite tasks and four Atari tasks. The patch rewards are calculated regarding several stacked frames and are mapped back onto each pixel.
Problem: How to learn to behave from visual demonstrations (videos, images) efficiently while maintaining good explainability?
Solution: Just use patch rewards with the AIL framework!
Patch Discriminator
Discriminate the difference between agent patches and expert patches.
Overall Architecture
Agents are trained by patch reward, which is further regularized by the patch logits distribution.
Patch Regulation
We want to maximize the similarity sim(s,s') to the demonstrations for the observation pair (s,s').
This can serve as a weight multiplier or a bonus term.
Comparison:
(a) Shared-encoder AIL: The discriminator learns from latent samples that are learned from the critic.
(b) Independent-encoder AIL: The discriminators learn from images directly with a scalar label.
Results
The results on DeepMind Control Suite have demonstrated that PatchAIL outperforms baseline methods.
To understand where the discriminators focus, we visualize the spatial feature maps of different methods.
It is clear that PatchAIL provides complete and holistic concentration on the key elements of images.
Explain why better:
1. Discriminator should learn its own representation instead of sharing with the critic
2. Scalar rewards bring less informative signals for training discriminator
3. Patch-level rewards provide a fine-grained measure
More results:
The results on Atari games are consistent. (The first method learning from observation only without any action info on Atari!)
More results:
Combined with BC, even better (only 1 demo trajectory).
See more experiments and interesting results in our paper!