COMPOSER: Scalable and Robust Modular Policies for Snake Robots

Yuyou Zhang, Yaru Niu, Xingyu Liu, Ding Zhao

Carnegie Mellon University

Introduction

Experiments

Goal-reaching with random goals

Emergent locomotion patterns

Zero-shot Generalizability

Robustness

Introduction

The inherent modularity in snake robots can be viewed from three perspectives: high dimensionality, scalability, and redundancy. In this work, we consider the snake robot as a modular robot and formulate the control of the snake robot as cooperative Multi-Agent Reinforcement Learning (MARL). Each segment of the snake robot operates as an independent module, relying on local observations to determine its actions. Our proposed method COMPOSER, incorporates a self-attention mechanism to enhance the cooperative behavior between agents and employs a high-level imagination policy to enable more efficient learning. The proposed method COMPOSER demonstrates superior efficiency, robustness against agent corruption, and zero-shot generalizability.

Framework overview of COMPOSER. Snake robot with n joints is formulated as n agents. The modular control policy outputs individual torque commands, while the imagination policy forecasts an ideal displacement per step. The control policy is trained to both complete the task and adhere to the direction prescribed by the imagination policy.

Experiments

Goal-reaching with random goals

Block-pushing

The snake robot learns manipulation skills including push, fling, and contact adjustment.

Success:

push

Success:

fling

Failure:

multiple attempts with contact adjustment

Shape-formation

The shape-formation task involves both locomotion and deformation control.

Success:

target shape with small curvature

Success:

target shape with sharp curvature

Failure:

target shape with sharp curvature

Tube-crossing

The tube-crossing task involves navigating in a confined space with curved terrain, requiring effective interactions with the environment.

Success

Wall climbing

In the wall climbing task, vertical wall climb indicates quick and efficient task success brought by the imagination policy.

Success:

vertical wall climb (w/ imagination)

Success:

horizontal wall traverse (w/o imagination)

Emergent locomotion patterns

Lateral Undulation :

the most common serpentine locomotion

Slide-Pushing :

used when the snake is startled and tries to escape quickly

In the goal-reaching task, the modular policy, when combined with imagination, exhibited the "Slide-pushing" pattern, whereas without imagination, it resulted in "Lateral Undulation."