Hiroki Furuta, Yusuke Iwasawa, Yutaka Matsuo, Shixiang Shane Gu
International Conference on Learning Representations (ICLR2023), notable-top-25%
Overview
The impressive success of large language models has encouraged the other domains, such as computer vision or robotics to leverage the large-scale pre-trained model trained with massive data with unified input-output (IO) interface. Learning a "generalist" model seems to be an essential goal in the recent machine learning paradigm with the same key ingredients: curate massive diverse dataset, define unified IO representation, and perform efficient representation and architecture selection, altogether for best generalization.
MxT-Bench
This paper first proposes MxT-Bench, the first multi-morphology and multi-task benchmarking environments, as a step toward building the massive diverse dataset for continuous control. MxT-Bench provides various combinations of different morphologies (ant, centipede, claw, worm, and unimal) and different tasks (reach, touch, and twisters).
Task Examples
We prepare several base tasks with parameterized goal distributions; reach, touch, twisters, for a procedural morphology-task generation.
Reach task requires the agents to put their one leg to the given goal position (XY).
Touch task requires the agents to contact their body or torso to the movable ball.
Twisters is a multi-goal problem; the agents should satisfy given goals (e.g. XY, XY, Z) simaltaneously.
Morphology-Task Graph
Next, we define unified IO representation for an architecture to ingest all the multi-morphology multi-task data. Inspired by scene graph in computer vision that represents the 3D relational information of a scene, and by morphology graph that expresses an agent's geometry and actions, we introduce the notion of morphology-task graph (MTG) as a unified interface to encode observations, actions, and goals (i.e. tasks) as nodes in the shared graph representation.
Behavior Distillation
Lastly, while conventional multi-task or meta RL studies generalization through on-policy joint training, we perform efficient representation and architecture selection, over 11 combinations of unified IO representation and network architectures, and 8 local node observations, for optimal generalization through behavior distillation, where RL is essentially treated as a (single-task, low-dimensional) behavior generator and multi-task supervised learning is used for imitating all the behaviors.
Experiments & Example Videos
Through offline distillation, we controllably and tractably evaluate two variants of MTG representation, along with multiple network architectures (MLP, GNN, Transformer), and show that morphology-task graph variant with Transformer improves the multi-task goal-reaching performances compared to other possible choices by 23% and provides better prior knowledge for zero-shot generalization (by 14 ~ 18%) and fine-tuning for downstream multi-task imitation learning (by 50 ~ 55%).
The average normalized final distance in various types of morphology-task generalization on MxT-Bench.
Multi-task goal-reaching performances on fine-tuning (multi-task imitation) for compositional and out-of-distribution evaluation.
Morphology-Task Graph v1: it accepts the morphological graph, encoded from the agent's geometric information, as an input-output interface, and merges positional goal information as a part of corresponding node features.
claw_touch_6
unimal_reach
ant_touch_6
ant_twisters_5
centipede_reach_4
ant_reach_5 (missing)
claw_reach_5
ant_reach_5
centipede_reach_4 (missing)
Morphology-Task Graph v2: it considers given goals as additional disjoint nodes of morphological graph representation.
claw_reach_4
centipede_touch_6
ant_touch_5
ant_touch_4
claw_twisters_5
ant_twisters_5
centipede_reach_3
unimal_reach
ant_reach_5 (missing)