Equivariant Reinforcement Learning under Partial Observability
Hai Nguyen, Andrea Baisero, David Klee, Dian Wang, Robert Platt, Christopher Amato
Khoury College of Computer Sciences, Northeastern University, Boston, MA, United States
Abstract
Incorporating inductive biases is a promising approach for tackling challenging robot learning domains with sample-efficient solutions. This paper identifies partially observable domains where symmetries can be a useful inductive bias for efficient learning. Specifically, by encoding the equivariance regarding specific group symmetries into the neural networks, our actor-critic reinforcement learning agents can reuse solutions in the past for related scenarios. Consequently, our equivariant agents outperform non-equivariant approaches significantly in terms of sample efficiency and final performance, demonstrated through experiments on a range of robotic tasks in simulation and real hardware.
Learned Policies in Simulation

Block-Picking
Out of two same blocks, the agent must pick the only movable block

Block-Pulling
Out of two same blocks, the agent must pull the only movable block so that the two blocks are in contact

Block-Pushing
Out of two same blocks, the agent must push the only movable block to a goal pad

Drawer-Opening
Out of two same drawers, the agent must open the only unlocked drawer

CarFlag-1D
The car must go to the green flag, which can be on either the leftmost or rightmost. Only when it is at the blue flag, the car can observe the side of the green flag

CarFlag-2D
The agent (red) must go to the green cell. Only when the agent is inside the blue region, it can observe the coordinate of the goal cell
Zero-shot Transfer for Robot Domains

Block-Picking
Out of two same blocks, the agent must pick the only movable block

Block-Pulling
Out of two same blocks, the agent must pull the only movable block so that the two blocks are in contact

Block-Pushing
Out of two same blocks, the agent must push the only movable block to a goal pad
