Robust Policy Learning over Multiple Uncertainty Sets

Annie Xie, Shagun Sodhani, Chelsea Finn,

Joelle Pineau, Amy Zhang

[code] [arxiv]

Abstract: Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments. While system identification methods provide a way to infer the variation from online experience, they can fail in settings where fast identification is not possible. Another dominant approach is robust RL which produces a policy that can handle worst-case scenarios, but these methods are generally designed to achieve robustness to a single uncertainty set that must be specified at train time. Towards a more general solution, we formulate the multi-set robustness problem to learn a policy robust to different perturbation sets. We then design an algorithm that enjoys the benefits of both system identification and robust RL: it reduces uncertainty where possible given a few interactions, but can still act robustly with respect to the remaining uncertainty. On a diverse set of control tasks, our approach demonstrates improved worst-case performance on new environments compared to prior methods based on system identification and on robust RL alone.

Multi-Set Robustness


Our work studies the uncertainty that arises when an agent is transferred to a new environment, after training on similar tasks related through a common set of underlying parameters, often referred to as the context. Robust RL is one of the primary approaches to this problem as it aims to learn a policy that performs well under worst-case perturbations to the context.

However, these solutions require a prior uncertainty set over the context for the test-time environment to learn the robust policy for this set at training time. Building in this prior ahead of time can limit the flexibility of the resulting policy: a large uncertainty set produces an overly conservative policy that can potentially underperform in all environments, but a small uncertainty set can fail to represent the target environment. We, therefore, formulate and study the multi-set robustness problem (illustrated above) whose goal is to learn a policy with strong worst-case performance on new uncertainty sets.

System Identifiability

We can solve the multi-set robustness problem by naively contextualizing existing robust methods with the uncertainty set. However, this can still be sub-optimal as these methods do not reduce uncertainty over the context. In particular, the parameters that make up the context can sometimes be quickly identified, given a history of interactions.

For example, in this peg insertion task, there is uncertainty over the peg's size and the robot's step size. The robot's step size can be quickly identified within the first few timesteps, but the agent cannot as easily identify the peg size until it attempts to insert the peg in one of the boxes.

SIRSA: System Identification and Risk-Sensitive Adaptation

To handle systems with partially-identifiable parameters, we design an algorithm that combines system identification and robust RL: it reduces the model uncertainty where possible while behaving cautiously with respect to the irreducible uncertainty. We call our algorithm System Identification and Risk-Sensitive Adaptation (SIRSA).

First, SIRSA updates the uncertainty set over the context with the agent's recent history and a probabilistic system identification model. It then uses this inferred uncertainty set to optimize a risk-sensitive set-conditioned policy over the returns within the inferred set.

Experiments

POINT MASS

Set-EPOpt

System ID

SIRSA (Ours)

In this domain, the agent needs to avoid colliding with the obstacle (shaded circle in lavender) whose size is uncertain, while moving at an uncertain x-velocity. The maximum obstacle size of the uncertainty set is marked by the unfilled blue circle. The policy learned by Set-EPOpt avoids the maximum obstacle of the uncertainty set, while System ID agent makes a slight collision with the obstacle. Finally, the SIRSA agent avoids the obstacle with a tighter turn than Set-EPOpt.

MINITAUR

Set-EPOpt

System ID

SIRSA (Ours)

The robot's mass and joint failure rate (of the back-right joint) are uncertain. The Set-EPOpt and System ID agents fall early into the rollout, while the SIRSA agent learns more stable walking behavior.

HALF-CHEETAH

Set-EPOpt

System ID

SIRSA (Ours)

In this domain, there is uncertainty over the robot's mass, joint friction, and joint failure rate.

PEG INSERTION

Set-EPOpt

System ID

SIRSA (Ours)

In this domain, the peg size and the robot's step size are uncertain. Smaller pegs can be inserted into the box at the center, while larger pegs can only be inserted in the box on the far left. The Set-EPOpt agent fails to reach either box due to the uncertainty in the step size. The System ID agent, on the other hand, can infer the correct step size after a few steps, but it tries to insert the large peg into the box in the middle, which can only accommodate smaller pegs. Finally, the SIRSA agent acts more conservatively with respect to the uncertainty in the peg size, and inserts the peg into the box on the left.