The ARC representation allows us to learn policies for downstream tasks which cannot be specified as goal-reaching. The task pictured above requires the ant to reach the green circle while avoiding the red region. See Section 6.5 of the paper for results.
The ARC representation generalizes better than the goal-conditioned policy on which it was learned. In the task pictured above, a representation is learned in the green region, and the representation is used to shape reward for goal-reaching tasks in the red region. See Sec 6.4 of the paper for results.
The ARC representation is amenable for learning high-level controllers in temporally extended tasks. We outline two ways to learn such controllers, through direct command and clustering (pictured above), and show fast learning on a variety of domains in Section 6.6 of the paper.
The green region corresponds to the area on which the goal-conditioned policy and ARC representation was trained, and the red region the region we generalize to. The ARC representations learns quickly, often as fast as the oracle hand-shaped reward.
All the environments in this work are simulated in MuJoCo.
This environment consists of an agent navigating to points in an environment with four rooms, as arranged on the right. The state space is 2-dimensional, consisting of the Cartesian coordinates of the agent. The agent has acceleration control, so the action space is 2-dimensional.
Downstream tasks for this environment include reaching target locations in the environment and navigating through sequences of rooms.
This environment consists of a car navigating to locations within four rooms, as arranged on the right. The state space is 6-dimensional, consisting of the Cartesian coordinates, heading, forward velocity, and angular velocity of the car. The agent controls the velocity of both of its wheels, resulting in a 2-dimensional action space.
Downstream tasks for wheeled navigation include reaching target locations in the environment, navigating through sequences of rooms, and navigating through sequences of waypoints.
This task requires a quadrupedal ant robot navigating in free space.The state space is 15-dimensional, consisting of the Cartesian coordinates of the ant, body orientation as a quaternion, and all the joint angles of the ant. The agent must use torque control to control it's joints, resulting in an 8-dimensional action space.
Downstream tasks for the ant include reaching target locations in the environment, navigating through sequences of waypoints, and reaching target locations while avoiding other locations.
This environment involves a Sawyer manipulator and a freely moving block on a table-top. The state space is 6-dimensional, consisting of the Cartesian coordinates of the end-effector of the Sawyer, and the Cartesian coordinates of the block. The Sawyer is controlled via end-effector position control with a 3-dimensional action space.