Feasibility Consistent Representation Learning for Safe Reinforcement Learning
Abstract
In the field of safe reinforcement learning (RL), finding a balance between satisfying safety constraints and optimizing reward performance presents a significant challenge. A key obstacle in this endeavor is the estimation of safety constraints, which is typically more difficult than estimating a reward metric due to the sparse nature of the constraint signals. To address this issue, we introduce a novel framework named Feasibility Consistent Safe Reinforcement Learning (FCSRL). This framework combines representation learning with feasibility-oriented objectives to identify and extract safety-related information from the raw state for safe RL. Leveraging self-supervised learning techniques and a more learnable safety metric, our approach enhances the policy learning and constraint estimation. Empirical evaluations across a range of vector-state and image-based tasks demonstrate that our method is capable of learning a better safety-aware embedding and achieving superior performance than previous representation learning baselines.
Experiment Visualization
TD3-Lag (baseline) performance
PointGoal1
Target: to reach the green goal while avoiding blue circles and cyan square.
PointButton1
Target: to reach the correct ball while avoiding blue circles, purple squares and wrong ball.
PointPush1
Target: to push the yellow object to green goal while avoiding blue circles and blue pillar.
PointGoal2
Target: to reach the green goal while avoiding blue circles and cyan squares.
CarGoal2
Target: to reach the green goal while avoiding blue circles and cyan square.
CarButton1
Target: to reach the correct ball while avoiding blue circles, purple squares and wrong ball.
FCSRL (ours) performance
PointGoal1
Target: to reach the green goal while avoiding blue circles and cyan square.
PointButton1
Target: to reach the correct ball while avoiding blue circles, purple squares and wrong ball.
PointPush1
Target: to push the yellow object to green goal while avoiding blue circles and blue pillar.
PointGoal2
Target: to reach the green goal while avoiding blue circles and cyan squares.
CarGoal1
Target: to reach the green goal while avoiding blue circles and cyan square.
CarButton1
Target: to reach the correct ball while avoiding blue circles, purple squares and wrong ball.
FCSRL (ours) performance in vision tasks
Image-based tasks
We only use the "first perspective view" image (64x64) as the input of agent. All other information (e.g., Lidar) is unknown.
Safety-related embedding features quality verification
Building upon previous work in representation learning [1], we conduct linear probing to assess the effectiveness of the learned embedding on safety-related features extraction.
It may take several seconds to load following videos.
PointButton1 (Vision)
PointGoal2 (Vision)
CarGoal1 (Vision)
Reference
[1] He, Kaiming, et al. "Masked autoencoders are scalable vision learners." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.