Normalization Enhances Generalization in Visual Reinforcement Learning
Lu Li*, Jiafei Lyu*, Guozheng Ma, Zilin Wang, Zhenjie Yang,
Xiu Li, Zhiheng Li
AAMAS 2024
Lu Li*, Jiafei Lyu*, Guozheng Ma, Zilin Wang, Zhenjie Yang,
Xiu Li, Zhiheng Li
AAMAS 2024
Abstract: Recent advances in visual reinforcement learning (RL) have led to impressive success in handling complex tasks. However, these methods have demonstrated limited generalization capability to visual disturbances, which poses a significant challenge for their real-world application and adaptability. Though normalization techniques have demonstrated huge success in supervised and unsupervised learning, their applications in visual RL are still scarce. In this paper, we explore the potential benefits of integrating normalization into visual RL methods with respect to generalization performance. We find that, perhaps surprisingly, incorporating suitable normalization techniques is sufficient to enhance the generalization capabilities, without any additional special design. We utilize the combination of two normalization techniques, CrossNorm and SelfNorm, for generalizable visual RL. Extensive experiments are conducted on DMControl Generalization Benchmark, CARLA, and ProcGen Benchmark to validate the effectiveness of our method.. We show that our method significantly improves generalization capability while only marginally affecting sample efficiency. In particular, when integrated with DrQ-v2, our method enhances the test performance of DrQ-v2 on CARLA across various scenarios, from 14% of the training performance to 97%.
The pipeline of our method. CrossNorm is positioned after the convolutional layer and is followed by SelfNorm. Each CrossNorm layer is randomly activated during training and becomes inactive during testing. Instead, SelfNorm is adopted during training and remains functional during testing. Our method notably does not introduce new learning objective or utilize out-of-domain data.
In CARLA autonomous driving simulator, agents are trained under one fixed weather condition. These agents are then expected to generalize to unseen weather conditions in a zero-shot manner.
Training: WetCloudySunset
Average reward:225
Eval: WetNoon
Average reward:210
Eval: HardRainRoon
Average reward:237
Training: WetCloudySunset
Average reward:221
Eval: WetNoon
Average reward:82
Eval: HardRainRoon
Average reward:190
Training: WetCloudySunset
Average reward:173
Eval: WetNoon
Average reward:1
Eval: HardRainRoon
Average reward:146
In DMC-GB, agents are trained in standard DeepMind Control environments and subsequently evaluated in visually disturbed environments. These disturbances include changes in color and the replacement of backgrounds with moving videos.
Training
color hard
video easy
video hard