Simple Emergent Action Representations 

from Multi-Task Policy Training

Abstract

The low-level sensory and motor signals in deep reinforcement learning, which exist in high-dimensional spaces such as image observations or motor torques, are inherently challenging to understand or utilize directly for downstream tasks. While sensory representations have been extensively studied, the representations of motor actions are still an area of active exploration. Our work reveals that a space containing meaningful action representations emerges when a multi-task policy network takes as inputs both states and task embeddings. Moderate constraints are added to improve its representation ability. Therefore, interpolated or composed embeddings can function as a high-level interface within this space, providing instructions to the agent for executing meaningful action sequences. Empirical results demonstrate that the proposed action representations are effective for intra-action interpolation and inter-action composition with limited or no additional learning. Furthermore, our approach exhibits superior task adaptation ability compared to strong baselines in Mujoco locomotion tasks. Our work sheds light on the promising direction of learning action representations for efficient, adaptable, and composable RL, forming the basis of abstract action planning and the understanding of motor signal space. 

Animated Results

1. Multi-task Training

HalfCheetah-Vel: 

Vel-1

Vel-3

Vel-5

Vel-10

Hopper-Vel: 

Vel-0.2

Vel-0.6

Vel-1.0

Vel-2.0

Walker-Vel: 

Vel-0.2

Vel-0.6

Vel-1.0

Vel-2.0

Ant-Dir

Dir-45

Dir-150

Dir-240

Dir-315

2. Task Interpolation

HalfCheetah-Vel:

Vel-1.0 (pre-train)

Vel-1.5 (interpolate)

Evaluation: 1.51 m/s

Vel-2.0 (pre-train)

Hopper-Vel:

Vel-1.6 (pre-train)

Vel-1.7 (interpolate)

Evaluation: 1.70 m/s

Vel-1.8 (pre-train)

Walker-Vel:

Vel-1.6 (pre-train)

Vel-1.7 (interpolate)

Evaluation: 1.70 m/s

Vel-1.8 (pre-train)

Ant-Dir:

Dir-30 (pre-train)

Dir-37.5 (interpolate)

Evaluation: 37.51 degree

Dir-45 (pre-train)

3. Task Compostion

Basic motions in HalfCheetah-Run-Jump:

Walk

Run

Stand

Jump

Two composed tasks:

Walk-stand

Run-jump

4. Adaptation

HalfCheetah-Vel:

Vel-3.6

Reward: -73.39

Vel-7.2

Reward: -181.16

Vel-9.5

Reward: -291.45

Hopper-Vel:

Vel-0.3

Reward: 193.98

Vel-0.7

Reward: 192.25

Vel-1.5

Reward: 165.32

Walker-Vel:

Vel-0.3

Reward: 194.03

Vel-0.7

Reward: 186.54

Vel-1.5

Reward: 169.23

Ant-Dir:

Dir-77

Reward: 1269.30

Dir-133

Reward: 1323.05

Dir-201

Reward: 1338.82