TLDA

Don't Touch What Matters: Task-Aware Lipschiz Data Augmentation for Visual Reinforcement Learning

This is the support material for paper : "Don’t Touch What Matters: Task-Aware Lipschiz Data Augmentation for Visual Reinforcement Learning".

Abstract:

One of the key challenges in visual Reinforcement Learning (RL) is to learn policies that can generalize to unseen environments. Recently, data augmentation techniques aiming at enhancing data diversity have demonstrated proven performance in improving the generalization ability of learned policies. However, due to the sensitivity of RL training, naively applying data augmentation, which transforms each pixel in a taskagnostic manner, may suffer from instability and damage the sample efficiency, thus further exacerbating the generalization performance. At the heart of this phenomenon is the diverged action distribution and high-variance value estimation in the face of augmented images. To alleviate this issue, we propose Task-aware Lipschitz Data Augmentation (TLDA) for visual RL, which explicitly identifies the task-correlated pixels with large Lipschitz constants, and only augments the task-irrelevant pixels for stability. We verify the effectiveness of our approach on DeepMind Control suite, CARLA and DeepMind Manipulation tasks. The extensive empirical results show that TLDA improves both sample efficiency and generalization; it outperforms previous state-of-the-art methods across 3 different visual control benchmarks.

Method：

Overview of TLDA. This figure shows two examples of TLDA and the pipeline implementing it. The agent generates the K-matrix in a frame, and then preserves the larger Lipschitz constant areas under strong augmentation. The preserved areas are highlighted in the K-matrix.

TLDA can reliably identify and augment pixels that are not strongly correlated with the learning task while keeping task-related pixels untoched.

Some examples of TLDA:

Reward：930

(Original)

Reward：717

(Random Cutout)

Reward：916

(Ours)

Reward：930

(Original)

Reward：317

(Gaussian Blur)

Reward：924

(Ours)

Reward：930

(Original)

Reward：412

(Random Overlay)

Reward：804

(Ours)

Reward：937

(Original)

Reward：496

(Random CutoutColor)

Reward：682

(Ours)

Reward：937

(Original)

Reward：480

(Salting Pepper Noise)

Reward：900

(Ours)

Reward：988

(Original)

Reward：0

(Random Conv)

Reward：423

(Ours)

Reward：988

(Original)

Reward：0

(GrayScale)

Reward：812

(Ours)

As shown in the above examples, blindly augmenting the image might touch the pixels in this area and cause catastrophic action/value changes. SVEA only uses strong augmentation but retains no raw pixel, while TLDA preserves the critical parts of the original observations.

Generalization Performance:

DMC-GB

SVEA:

(color-testing)