Learning Action Relations for Reinforcement Learning

1University of Southern California (USC), 2NAVER CLOVA, 3KAIST, 4NAVER AI Lab

International Conference on Learning Representations (ICLR), 2022

Problem: Tasks with varying action sets require reasoning of action relations.

Intelligent agents can solve tasks in various ways depending on their available set of actions. However, conventional reinforcement learning (RL) assumes a fixed action set. This work asserts that tasks with varying action sets require reasoning of the relations between the available actions. For instance, taking a nail-action in a repair task is meaningful only if a hammer-action is also available. To learn and utilize such action relations, we propose a novel policy architecture consisting of a graph attention network over the available actions. We show that our model makes informed action decisions by correctly attending to other related actions in both value-based and policy-based RL. Consequently, it outperforms non-relational architectures on applications where the action space often varies, such as recommender systems and physical reasoning with tools and skills.

Approach: Action Graph for Interdependence Learning

Qualitative Results

CREATE: Varying Tool Sets

Task: Select and place tools to push the red ball towards the green goal.

Each episode: Subsamples an action set of size 251098 general tools & 5 activator tools.




AGILE learns activator associations to solve the task.

Success Examples on CREATE - AGILE

Failure Examples on CREATE - AGILE

Dig Lava Grid Navigation: Varying Skill Sets

Task: Select navigation or digging skills to quickly get to the green goal while avoiding orange and pink lava

Grid World.mp4

AGILE (Action Relations): Finds Different Optimal Solutions


Grid_World_Baseline.mp4

Baseline (Ignores other actions): Learns a Suboptimal Solution


Visualizing Attention Maps in AGILE

Attention in AGILE (1) Learns tool associations in CREATE, (2) Ensures dig-skill is available before entering Lava in Grid World, (3) Extracts item statistics in RecSim

Quantitative Results

Citation

@inproceedings{

jain2022know,

title={Know Your Action Set: Learning Action Relations for Reinforcement Learning},

author={Ayush Jain and Norio Kosaka and Kyung-Min Kim and Joseph J Lim},

booktitle={International Conference on Learning Representations},

year={2022},

url={https://openreview.net/forum?id=MljXVdp4A3N}

}