Learning Actionable Representations with Goal Conditioned Policies
Applications of Actionable Representations
Learning Downstream Tasks
The ARC representation allows us to learn policies for downstream tasks which cannot be specified as goal-reaching. The task pictured above requires the ant to reach the green circle while avoiding the red region. See Section 6.5 of the paper for results.
Reward Shaping
The ARC representation generalizes better than the goal-conditioned policy on which it was learned. In the task pictured above, a representation is learned in the green region, and the representation is used to shape reward for goal-reaching tasks in the red region. See Sec 6.4 of the paper for results.
Hierarchical RL
The ARC representation is amenable for learning high-level controllers in temporally extended tasks. We outline two ways to learn such controllers, through direct command and clustering (pictured above), and show fast learning on a variety of domains in Section 6.6 of the paper.
Learning Downstream Tasks
Reward Shaping
Ant
Wheeled Navigation
The green region corresponds to the area on which the goal-conditioned policy and ARC representation was trained, and the red region the region we generalize to. The ARC representations learns quickly, often as fast as the oracle hand-shaped reward.
Hierarchical Reinforcement Learning
Biased Rooms
Wheeled Navigation
Ant Locomotion
Environments
All the environments in this work are simulated in MuJoCo.
2D Navigation
Wheeled Navigation
Ant Locomotion
Sawyer Manipulation
2D Navigation
This environment consists of an agent navigating to points in an environment with four rooms, as arranged on the right. The state space is 2-dimensional, consisting of the Cartesian coordinates of the agent. The agent has acceleration control, so the action space is 2-dimensional.
Downstream tasks for this environment include reaching target locations in the environment and navigating through sequences of rooms.
Wheeled Navigation
This environment consists of a car navigating to locations within four rooms, as arranged on the right. The state space is 6-dimensional, consisting of the Cartesian coordinates, heading, forward velocity, and angular velocity of the car. The agent controls the velocity of both of its wheels, resulting in a 2-dimensional action space.
Downstream tasks for wheeled navigation include reaching target locations in the environment, navigating through sequences of rooms, and navigating through sequences of waypoints.
Legged Locomotion
This task requires a quadrupedal ant robot navigating in free space.The state space is 15-dimensional, consisting of the Cartesian coordinates of the ant, body orientation as a quaternion, and all the joint angles of the ant. The agent must use torque control to control it's joints, resulting in an 8-dimensional action space.
Downstream tasks for the ant include reaching target locations in the environment, navigating through sequences of waypoints, and reaching target locations while avoiding other locations.
Robotic Manipulation
This environment involves a Sawyer manipulator and a freely moving block on a table-top. The state space is 6-dimensional, consisting of the Cartesian coordinates of the end-effector of the Sawyer, and the Cartesian coordinates of the block. The Sawyer is controlled via end-effector position control with a 3-dimensional action space.