How can agents solve decision-making tasks when the available actions (tools or skills) have not been seen before?
A fundamental trait of intelligence is the ability to achieve goals in the face of novel circumstances, such as making decisions from new action choices. However, standard reinforcement learning assumes a fixed set of actions and requires expensive retraining when given a new action set. To make learning agents more adaptable, we introduce the problem of zero-shot generalization to new actions. We propose a two-stage framework where the agent first infers action representations from action information acquired separately from the task. A policy flexible to varying action sets is then trained with generalization objectives. We benchmark generalization on sequential tasks, such as selecting from an unseen tool-set to solve physical reasoning puzzles and stacking towers with novel 3D shapes.
We propose four benchmarking environments to evaluate the problem of generalization to new actions.
Chain Reaction Tool Environment (CREATE): Select which tool to place and where to place it to get the red ball to the goal location (green). Evaluates ability to select new tools.
Shape Stacking: Select which shape to place and where to place it above the table to stack the highest possible tower. Evaluates stacking with new shapes.
Grid World: Select 5-step skills to avoid lava and reach the goal. Evaluates utilizing a new skillset.
Recommender: Recommend items to users. Evaluated on new items. (No Videos)
The following videos are results of evaluating a learned policy on randomly sampled action sets. These were not hand-picked.
Testing on Out-of-distribution Actions
We test performance of a learned policy on unseen tool classes in CREATE environment and unseen shape classes in Shape Stacking environment.
CREATE Training Tools: Variations of Trampoline, Ramp, Ball, See-saw, Cannon, Bucket.
CREATE Testing Tools: Variations of Fan, Funnel, Conveyer Belt, Triangle, Lever.
Stacking Training Shapes: Variations of Domes, Rectangles, Capsules, Triangles, Arches, Spheres.
Stacking Testing Shapes: Variations of Cylinders, Tetrahedrons, Cubes, Cones, Angled-Rectangles, Angled-Triangles
Shape Stack Training
Shape Stack Testing
More CREATE Tasks
The CREATE benchmark consists of 12 tasks in total. The videos below show evaluations on new actions with the same train-test split as the original 3 tasks above.
t-SNE Visualization of Learned Action Representations
We test whether the action encoder extracts semantic information from high-dimensional action observations. In the following visualizations, the action representations inferred for unseen actions are plotted and labeled with semantic information, such as the tool, shape, or skill class they belong to.