A System for Morphology-Task Generalization via Unified Representation and Behavior Distillation

Hiroki Furuta, Yusuke Iwasawa, Yutaka Matsuo, Shixiang Shane Gu

International Conference on Learning Representations (ICLR2023), notable-top-25%

Overview

The impressive success of large language models has encouraged the other domains, such as computer vision or robotics to leverage the large-scale pre-trained model trained with massive data with unified input-output (IO) interface. Learning a "generalist" model seems to be an essential goal in the recent machine learning paradigm with the same key ingredients: curate massive diverse dataset, define unified IO representation, and perform efficient representation and architecture selection, altogether for best generalization.

MxT-Bench

This paper first proposes MxT-Bench, the first multi-morphology and multi-task benchmarking environments, as a step toward building the massive diverse dataset for continuous control. MxT-Bench provides various combinations of different morphologies (ant, centipede, claw, worm, and unimal) and different tasks (reach, touch, and twisters).

Task Examples

We prepare several base tasks with parameterized goal distributions; reach, touch, twisters, for a procedural morphology-task generation.

Reach task requires the agents to put their one leg to the given goal position (XY).

Touch task requires the agents to contact their body or torso to the movable ball.

Twisters is a multi-goal problem; the agents should satisfy given goals (e.g. XY, XY, Z) simaltaneously.

Morphology-Task Graph

Next, we define unified IO representation for an architecture to ingest all the multi-morphology multi-task data. Inspired by scene graph in computer vision that represents the 3D relational information of a scene, and by morphology graph that expresses an agent's geometry and actions, we introduce the notion of morphology-task graph (MTG) as a unified interface to encode observations, actions, and goals (i.e. tasks) as nodes in the shared graph representation.

Behavior Distillation

Lastly, while conventional multi-task or meta RL studies generalization through on-policy joint training, we perform efficient representation and architecture selection, over 11 combinations of unified IO representation and network architectures, and 8 local node observations, for optimal generalization through behavior distillation, where RL is essentially treated as a (single-task, low-dimensional) behavior generator and multi-task supervised learning is used for imitating all the behaviors.

Experiments & Example Videos

Through offline distillation, we controllably and tractably evaluate two variants of MTG representation, along with multiple network architectures (MLP, GNN, Transformer), and show that morphology-task graph variant with Transformer improves the multi-task goal-reaching performances compared to other possible choices by 23% and provides better prior knowledge for zero-shot generalization (by 14 ~ 18%) and fine-tuning for downstream multi-task imitation learning (by 50 ~ 55%).

The average normalized final distance in various types of morphology-task generalization on MxT-Bench.

Multi-task goal-reaching performances on fine-tuning (multi-task imitation) for compositional and out-of-distribution evaluation.

Morphology-Task Graph v1: it accepts the morphological graph, encoded from the agent's geometric information, as an input-output interface, and merges positional goal information as a part of corresponding node features.