Normalization Enhances Generalization in Visual Reinforcement Learning

Lu Li*, Jiafei Lyu*, Guozheng Ma, Zilin Wang, Zhenjie Yang,

Xiu Li, Zhiheng Li

AAMAS 2024

Normalization-Enhances-Generalization-in-Visual-Reinforcement-Learning

Abstract: Recent advances in visual reinforcement learning (RL) have led to impressive success in handling complex tasks. However, these methods have demonstrated limited generalization capability to visual disturbances, which poses a significant challenge for their real-world application and adaptability. Though normalization techniques have demonstrated huge success in supervised and unsupervised learning, their applications in visual RL are still scarce. In this paper, we explore the potential benefits of integrating normalization into visual RL methods with respect to generalization performance. We find that, perhaps surprisingly, incorporating suitable normalization techniques is sufficient to enhance the generalization capabilities, without any additional special design. We utilize the combination of two normalization techniques, CrossNorm and SelfNorm, for generalizable visual RL. Extensive experiments are conducted on DMControl Generalization Benchmark, CARLA, and ProcGen Benchmark to validate the effectiveness of our method.. We show that our method significantly improves generalization capability while only marginally affecting sample efficiency. In particular, when integrated with DrQ-v2, our method enhances the test performance of DrQ-v2 on CARLA across various scenarios, from 14% of the training performance to 97%.

The pipeline of our method. CrossNorm is positioned after the convolutional layer and is followed by SelfNorm. Each CrossNorm layer is randomly activated during training and becomes inactive during testing. Instead, SelfNorm is adopted during training and remains functional during testing. Our method notably does not introduce new learning objective or utilize out-of-domain data.

CARLA Autonomous Driving

In CARLA autonomous driving simulator, agents are trained under one fixed weather condition. These agents are then expected to generalize to unseen weather conditions in a zero-shot manner.

Ours(DrQ-v2+CNSN)

Training: WetCloudySunset

Average reward:225

Eval: WetNoon

Average reward:210

Eval: HardRainRoon

Average reward:237

SVEA(conv)

Training: WetCloudySunset

Average reward:221

Eval: WetNoon

Average reward:82

Eval: HardRainRoon

Average reward:190

SVEA(overlay)

Training: WetCloudySunset

Average reward:173

Eval: WetNoon

Average reward:1

Eval: HardRainRoon

Average reward:146

DeepMind Control Generalization Benchmark

In DMC-GB, agents are trained in standard DeepMind Control environments and subsequently evaluated in visually disturbed environments. These disturbances include changes in color and the replacement of backgrounds with moving videos.

Training

color hard

video easy

video hard