DynSyn

Dynamical Synergistic Representation for Efficient Learning and Control 

in Overactuated Embodied Systems


Kaibo He, Chenhui Zuo, Chengtian Ma, Yanan Sui

Tsinghua University

International Conference on Machine Learning (ICML) 2024

Abstract

Learning an effective policy to control high-dimensional, overactuated systems is a significant challenge for deep reinforcement learning algorithms. Such control scenarios are often observed in the neural control of vertebrate musculoskeletal systems. The study of these control mechanisms will provide insights into the control of high-dimensional, overactuated systems. The coordination of actuators, known as muscle synergies in neuromechanics, is considered a presumptive mechanism that simplifies the generation of motor commands. The dynamical structure of a system is the basis of its function, allowing us to derive a synergistic representation of actuators. Motivated by this theory, we propose the Dynamical Synergistic Representation (DynSyn) algorithm. DynSyn aims to generate synergistic representations of dynamical structures and perform task-specific, state-dependent adaptation to the representations to improve motor control. We demonstrate DynSyn’s efficiency across various tasks involving different musculoskeletal models, achieving state-of-the-art sample efficiency and robustness compared to baseline algorithms. DynSyn generates interpretable synergistic representations that capture the essential features of dynamical structures and demonstrates generalizability across diverse motor tasks.

Problem Setting

The control of the musculoskeletal model is achieved through the following steps:

Motivation

The brown link represents a robot arm (or bone), while the blue and green lines represent the cable actuators (or muscles). By randomly controlling the joint velocity, the lengths of the four actuators are demonstrated on the right. Actuators with similar functions are categorized into the same group due to similar structures, based on the correlation of length changes.

Overview

The algorithm generates a unified action for each group of actuators, along with state-dependent correction weights for each actuator on top of the unified action.

Result

FullBody-Gait

Ostrich-Run

Arm-Locate

MyoLegs-Walk

MyoHand-Reorient100

Learning curves in the experimental environments. Mean ± SD across 5 random seeds for all the environments. The return of baselines decreases as the number of action dimensions increases, while DynSyn is the only algorithm that performs well even in a very high-dimensional action space of 700 dimensions in the FullBody-Gait environment. 

When the same representations are applied to tasks with additional environmental conditions or changed targets, such as rugged terrains and walking direction, DynSyn maintains good performance.