# Learning Actionable Representations with Goal Conditioned Policies

## Applications of Actionable Representations

## Learning Downstream Tasks

The ARC representation allows us to learn policies for downstream tasks which cannot be specified as goal-reaching. The task pictured above requires the ant to reach the green circle while avoiding the red region. See Section 6.5 of the paper for results.

## Reward Shaping

The ARC representation generalizes better than the goal-conditioned policy on which it was learned. In the task pictured above, a representation is learned in the green region, and the representation is used to shape reward for goal-reaching tasks in the red region. See Sec 6.4 of the paper for results.

## Hierarchical RL

The ARC representation is amenable for learning high-level controllers in temporally extended tasks. We outline two ways to learn such controllers, through direct command and clustering (pictured above), and show fast learning on a variety of domains in Section 6.6 of the paper.

## Learning Downstream Tasks

## Reward Shaping

## Ant

## Wheeled Navigation

The **green** region corresponds to the area on which the goal-conditioned policy and ARC representation was trained, and the **red **region the region we generalize to. The ARC representations learns quickly, often as fast as the oracle hand-shaped reward.

## Hierarchical Reinforcement Learning

### Biased Rooms

### Wheeled Navigation

### Ant Locomotion

## Environments

All the environments in this work are simulated in MuJoCo.

### 2D Navigation

### Wheeled Navigation

### Ant Locomotion

### Sawyer Manipulation

### 2D Navigation

This environment consists of an agent navigating to points in an environment with four rooms, as arranged on the right. The state space is 2-dimensional, consisting of the Cartesian coordinates of the agent. The agent has acceleration control, so the action space is 2-dimensional.

Downstream tasks for this environment include reaching target locations in the environment and navigating through sequences of rooms.

### Wheeled Navigation

This environment consists of a car navigating to locations within four rooms, as arranged on the right. The state space is 6-dimensional, consisting of the Cartesian coordinates, heading, forward velocity, and angular velocity of the car. The agent controls the velocity of both of its wheels, resulting in a 2-dimensional action space.

Downstream tasks for wheeled navigation include reaching target locations in the environment, navigating through sequences of rooms, and navigating through sequences of waypoints.

### Legged Locomotion

This task requires a quadrupedal ant robot navigating in free space.The state space is 15-dimensional, consisting of the Cartesian coordinates of the ant, body orientation as a quaternion, and all the joint angles of the ant. The agent must use torque control to control it's joints, resulting in an 8-dimensional action space.

Downstream tasks for the ant include reaching target locations in the environment, navigating through sequences of waypoints, and reaching target locations while avoiding other locations.

### Robotic Manipulation

This environment involves a Sawyer manipulator and a freely moving block on a table-top. The state space is 6-dimensional, consisting of the Cartesian coordinates of the end-effector of the Sawyer, and the Cartesian coordinates of the block. The Sawyer is controlled via end-effector position control with a 3-dimensional action space.