Attention-Privileged Reinforcement Learning

Interpolated Domains

Extrapolated Domains

NavWorld

JacoReach

Walker2D

NavWorld

JacoReach

Attention-Privileged Reinforcement Learning

Observation Policy Rollouts

Extra 4

Extra 8

Observation Attention Maps

Extra 4

Extra 8

White and black signify high and low attention values respectively. Attention is correctly paid to the agent and/or target in each domain. Distractors are suppressed. For JacoReach, attention is paid to every other link of the Kinova arm. As the system is constrained, the state of every link can be inferred by attending alternating links. For Walker2D, attention is dynamic in object space and varies based on the state and stability of the walker. For the extrapolation domains with additional 4 or 8 distractors, APRiL's attention generalises favourably, suppressing additional distractors, and the resulting policies perform well.

Asymmetric DDPG

Policy Rollouts

Extra 4

Extra 8

Extra 4

Extra 8

Asymmetric DDPG learns poorer performing policies than APRiL as can be seen by observing the policy rollouts on the interpolated domains (left 3 columns). For NavWorld and JacoReach, asymmetric DDPG's policies are less consistent at reaching the target object. For Walker2D, the walker does not learn a reasonable gait and can barely balance itself.
Without attention, asymmetric DDPG is unable to generalise well to the extrapolated domains (right 2 columns) with additional distractor objects. For NavWorld, the circular agent less consistently reaches the triangular target. For JacoReach, the additional distractors confuse the agent and the arm is unable to coordinate itself and point to the diamond shaped target object.

Google Sites

Report abuse