Generalization to New Actions in Reinforcement Learning
Ayush Jain* Andrew Szot* Joseph J. Lim
University of Southern California
International Conference on Machine Learning (ICML), 2020
Problem
How can agents solve decision-making tasks when the available actions (tools or skills) have not been seen before?
A fundamental trait of intelligence is the ability to achieve goals in the face of novel circumstances, such as making decisions from new action choices. However, standard reinforcement learning assumes a fixed set of actions and requires expensive retraining when given a new action set. To make learning agents more adaptable, we introduce the problem of zero-shot generalization to new actions. We propose a two-stage framework where the agent first infers action representations from action information acquired separately from the task. A policy flexible to varying action sets is then trained with generalization objectives. We benchmark generalization on sequential tasks, such as selecting from an unseen tool-set to solve physical reasoning puzzles and stacking towers with novel 3D shapes.
Approach
Generalization Results
We propose four benchmarking environments to evaluate the problem of generalization to new actions.
Chain Reaction Tool Environment (CREATE): Select which tool to place and where to place it to get the red ball to the goal location (green). Evaluates ability to select new tools.
Shape Stacking: Select which shape to place and where to place it above the table to stack the highest possible tower. Evaluates stacking with new shapes.
Grid World: Select 5-step skills to avoid lava and reach the goal. Evaluates utilizing a new skillset.
Recommender: Recommend items to users. Evaluated on new items. (No Videos)
The following videos are results of evaluating a learned policy on randomly sampled action sets. These were not hand-picked.
CREATE Obstacle
Training Examples
Testing Success
Testing Failures
CREATE Seesaw
Training Examples
Testing Success
Testing Failures
CREATE Push
Training Examples
Testing Success
Testing Failures
Shape Stacking
Training
Testing
Grid World
Training
Testing
Testing on Out-of-distribution Actions
We test performance of a learned policy on unseen tool classes in CREATE environment and unseen shape classes in Shape Stacking environment.
CREATE Training Tools: Variations of Trampoline, Ramp, Ball, See-saw, Cannon, Bucket.
CREATE Testing Tools: Variations of Fan, Funnel, Conveyer Belt, Triangle, Lever.
Stacking Training Shapes: Variations of Domes, Rectangles, Capsules, Triangles, Arches, Spheres.
Stacking Testing Shapes: Variations of Cylinders, Tetrahedrons, Cubes, Cones, Angled-Rectangles, Angled-Triangles
For more details about these tools and shapes, please refer to CREATE Environment Details and Shape Stacking Environment Details.
Obstacle Training
Obstacle Testing
Seesaw Training
Seesaw Testing
Push Training
Push Testing
Shape Stack Training
Shape Stack Testing
More CREATE Tasks
The CREATE benchmark consists of 12 tasks in total. The videos below show evaluations on new actions with the same train-test split as the original 3 tasks above.
Navigate
Buckets
Belt
Cannon
Collide
Moving
Ladder
Basket
Funnel
t-SNE Visualization of Learned Action Representations
We test whether the action encoder extracts semantic information from high-dimensional action observations. In the following visualizations, the action representations inferred for unseen actions are plotted and labeled with semantic information, such as the tool, shape, or skill class they belong to.
Environment Details
Citation
@InProceedings{pmlr-v119-jain20b,
title={Generalization to New Actions in Reinforcement Learning},
author={Jain, Ayush and Szot, Andrew and Lim, Joseph},
booktitle={Proceedings of the 37th International Conference on Machine Learning},
pages={4661--4672},
year={2020},
editor={III, Hal Daumé and Singh, Aarti},
volume={119},
series={Proceedings of Machine Learning Research},
month={13--18 Jul},
publisher={PMLR},
url={https://proceedings.mlr.press/v119/jain20b.html}
}