COMPOSER: Scalable and Robust Modular Policies for Snake Robots
Yuyou Zhang, Yaru Niu, Xingyu Liu, Ding Zhao
Carnegie Mellon University
Introduction
The inherent modularity in snake robots can be viewed from three perspectives: high dimensionality, scalability, and redundancy. In this work, we consider the snake robot as a modular robot and formulate the control of the snake robot as cooperative Multi-Agent Reinforcement Learning (MARL). Each segment of the snake robot operates as an independent module, relying on local observations to determine its actions. Our proposed method COMPOSER, incorporates a self-attention mechanism to enhance the cooperative behavior between agents and employs a high-level imagination policy to enable more efficient learning. The proposed method COMPOSER demonstrates superior efficiency, robustness against agent corruption, and zero-shot generalizability.
Framework overview of COMPOSER. Snake robot with n joints is formulated as n agents. The modular control policy outputs individual torque commands, while the imagination policy forecasts an ideal displacement per step. The control policy is trained to both complete the task and adhere to the direction prescribed by the imagination policy.
Experiments
Goal-reaching with random goals
Block-pushing
The snake robot learns manipulation skills including push, fling, and contact adjustment.
Success:
push
Success:
fling
Failure:
multiple attempts with contact adjustment
Shape-formation
The shape-formation task involves both locomotion and deformation control.
Success:
target shape with small curvature
Success:
target shape with sharp curvature
Failure:
target shape with sharp curvature
Tube-crossing
The tube-crossing task involves navigating in a confined space with curved terrain, requiring effective interactions with the environment.
Success
Wall climbing
In the wall climbing task, vertical wall climb indicates quick and efficient task success brought by the imagination policy.
Success:
vertical wall climb (w/ imagination)
Success:
horizontal wall traverse (w/o imagination)
Emergent locomotion patterns
Lateral Undulation :
the most common serpentine locomotion
Slide-Pushing :
used when the snake is startled and tries to escape quickly
In the goal-reaching task, the modular policy, when combined with imagination, exhibited the "Slide-pushing" pattern, whereas without imagination, it resulted in "Lateral Undulation."
Lateral Undulation w/o imagination
The "Slide-pushing" pattern highlights the contribution of the imagination policy in achieving efficient task success.
Slide-pushing w/ imagination
Zero-shot Generalizability
A modular policy trained on an 8-agent snake can generalize to longer snakes, in a zero-shot manner.
9-agent
10-agent
11-agent
Robustness
Robustness is tested on snake robots with one randomly selected actuator being disabled.
COMPOSER success rate: 65/100
PPO success rate: 22/100