Learning Actionable Representations with Goal Conditioned Policies

Applications of Actionable Representations

Learning Downstream Tasks

The ARC representation allows us to learn policies for downstream tasks which cannot be specified as goal-reaching. The task pictured above requires the ant to reach the green circle while avoiding the red region. See Section 6.5 of the paper for results.

Reward Shaping

The ARC representation generalizes better than the goal-conditioned policy on which it was learned. In the task pictured above, a representation is learned in the green region, and the representation is used to shape reward for goal-reaching tasks in the red region. See Sec 6.4 of the paper for results.

Hierarchical RL

The ARC representation is amenable for learning high-level controllers in temporally extended tasks. We outline two ways to learn such controllers, through direct command and clustering (pictured above), and show fast learning on a variety of domains in Section 6.6 of the paper.

Learning Downstream Tasks

Reward Shaping

Ant


Wheeled Navigation


The green region corresponds to the area on which the goal-conditioned policy and ARC representation was trained, and the red region the region we generalize to. The ARC representations learns quickly, often as fast as the oracle hand-shaped reward.

Hierarchical Reinforcement Learning

Biased Rooms


Wheeled Navigation


Ant Locomotion


Environments

All the environments in this work are simulated in MuJoCo.

2D Navigation

Wheeled Navigation

Ant Locomotion

Sawyer Manipulation

2D Navigation

This environment consists of an agent navigating to points in an environment with four rooms, as arranged on the right. The state space is 2-dimensional, consisting of the Cartesian coordinates of the agent. The agent has acceleration control, so the action space is 2-dimensional.

Downstream tasks for this environment include reaching target locations in the environment and navigating through sequences of rooms.

Wheeled Navigation

This environment consists of a car navigating to locations within four rooms, as arranged on the right. The state space is 6-dimensional, consisting of the Cartesian coordinates, heading, forward velocity, and angular velocity of the car. The agent controls the velocity of both of its wheels, resulting in a 2-dimensional action space.

Downstream tasks for wheeled navigation include reaching target locations in the environment, navigating through sequences of rooms, and navigating through sequences of waypoints.

Legged Locomotion

This task requires a quadrupedal ant robot navigating in free space.The state space is 15-dimensional, consisting of the Cartesian coordinates of the ant, body orientation as a quaternion, and all the joint angles of the ant. The agent must use torque control to control it's joints, resulting in an 8-dimensional action space.

Downstream tasks for the ant include reaching target locations in the environment, navigating through sequences of waypoints, and reaching target locations while avoiding other locations.

Robotic Manipulation

This environment involves a Sawyer manipulator and a freely moving block on a table-top. The state space is 6-dimensional, consisting of the Cartesian coordinates of the end-effector of the Sawyer, and the Cartesian coordinates of the block. The Sawyer is controlled via end-effector position control with a 3-dimensional action space.