SAR
Generalization of Physiological Agility and Dexterity via Synergistic Action Representation
Presented at RSS 2023
Learning control policies in high-dimensional continuous action spaces—like those of humans, animals, and robots—remains a significant challenge.
In this work, we take inspiration from evolved biological solutions to musculoskeletal control by computing and leveraging muscle synergies—i.e., coordinated muscle co-contraction modules implemented in the spinal cord. We show that extracting a Synergistic Action Representation (SAR) from simpler tasks enables robust transfer learning to far more complex tasks.
Below, we demonstrate the effectiveness of SAR for training control policies for legs and hands (both musculoskeletal and robotic).
Overview of SAR method
Computing and Extracting Synergies
Synergistic action representations are acquired in this work by aggregating muscle activation data extracted from a policy trained on a simpler version of the target task. We then use normalized ICAPCA to project the high-dimensional muscle activations into a smaller N-dimensional submanifold (see tutorial codebase more for details).
Learning Control with Synergies
After acquiring SAR, we train a policy to output two kinds of muscle activations at each timestep through a task-general pathway (which leverages SAR) and a task-specific pathway (which learns muscle activations in the original manifold). These two types of muscle activations are then blended to yield a final action. This policy architecture encodes the idea that muscle synergies provide a neuromotor foundation for a basic behavior (e.g., general object manipulation) that can be adapted to the demands of a specific task (e.g., manipulating a toy plane).
This policy architecture is depicted in the figure below.
Visualizing SAR-based locomotion
We begin by training a policy on a straight flat walking task for 1.5M steps, which yields the following (largely uncoordinated) behavior:
Interestingly, computing SAR from this rudimentary policy and retraining on the same task yields robust locomotion with only 1.5M additional samples:
We find that this same SAR can also be used to train policies on a wide array of terrains and task modifications in a highly sample efficient manner (1.5M pretraining + 2.5M additional steps with SAR). By contrast, policies trained end-to-end on the below terrains for an equivalent training budget of 4M steps fail to learn any meaningful locomotion behavior.
Visualizing SAR-based manipulation
Reorient: 100-object randomized manipulation task
After computing SAR using muscle activations from a simpler task, we investigate whether this representation enables learning in Reorient, a 100-object manipulation task with random object and desired reorientation initializations.
We find that baseline RL approaches are unable to achieve strong performance (<20% success) in this task, while training with SAR enables significantly more robust dexterity acquisition (>70% success).
Reorient-OOD: zero-shot generalization to objects with out-of-distribution dimensions
We compare the zero-shot generalization power of SAR-RL with baselines (described above) using a series of test sets with novel objects that were not present in the Reorient training set.
Here, we highlight Reorient-OOD, a test set with 1000 new objects with X, Y, and Z dimensions sampled from distributions above and below those used in generating objects for Reorient.
While policies from baseline approaches fail to zero-shot generalize to this new set (<10% success), SAR-RL exhibits comparatively strong performance (60% success).
RealWorldObjs: few-shot dexterity acquisition
We demonstrate that SAR enables rapid few-shot dexterity transfer to real-world objects and approximately doubles learning speed as compared to learning manipulations from scratch.
Ablating SAR: interpreting synergistic representations
What do these synergies encode behaviorally? We ablate SAR, preserving only the first N synergies, in order to elucidate the cumulative contributions of individual synergies to the acquired dexterity. We provide our qualitative interpretations of these ablations below.
The first two muscle synergies appear to encode generic full-hand grip and wrist flexion ('first 2 synergies').
The subsequent two muscle synergies appear to encode generic finger grip and thumb-index pinching ('first 4 synergies').
The final four muscle synergies appear to encode more precise coordinated control of individual digits ('all 8 synergies').
To the degree that SAR encodes something approximating these dexterity behaviors, it can be understood how this action representation enables generalization and transfer learning: namely, motor skills like full-hand grip, thumb-index pinching, and individual digit control are required for generalized dexterous manipulation, regardless of the precise geometrical properties of a given object.
Extending SAR beyond physiological control
Efficiently acquiring robust locomotion with Humanoid
Despite our focus on utilizing SAR to acquire hand dexterity, we emphasize that SAR can be used out-of-the-box for other high-dimensional control problems.
First, we show here that leveraging SAR on the Humanoid-v2 gym environment not only matches SOTA performance with significantly improved sample efficiency, but also yields qualitatively natural gait (compare RL-E2E and RL-Zoo benchmark policies with SAR-RL policy). Note that this is the same procedure that was used for yielding the MyoLegs locomotion policies displayed at the top of this page.
In order to train with SAR in this use case, we extract 7 synergies from the original 17-dimensional action space after 250k pretraining steps. Note that total timesteps are the same between SAR-RL and RL-E2E.
SAR-RL: trained on Humanoid-v2 250K steps → compute SAR → trained on Humanoid-v2 1M steps (total steps = 1.25M)
RL-E2E:
RL-Zoo benchmark:
Replicating Reorient experiment on Shadow Hand
This finding replicates the core experiment from our paper: namely, we
develop an eight-object reorientation task (2 x 4 MuJoCo geometries),
train a policy on this simpler task,
compute SAR from rollouts of this trained policy, and
use this representation to facilitate learning in a far more challenging 100-object (25 x 4 MuJoCo geometries) reorientation task.
In line with our reported findings on the musculoskeletal hand, we find that SAR-RL achieves 2x better performance in the same number of timesteps compared to end-to-end training on the 100-object task.
RL-E2E: 100-object reorientation task policy rollouts
SAR-RL: 100-object reorientation task policy rollouts
Simple dexterity acquisition and transfer on Shadow Hand
We compute and extract SAR from rollouts of a policy trained to reorient ellipsoids and leverage this learned representation to train on randomized capsule and cylinder reorientation tasks.
We find that in comparison to end-to-end RL on cylinder and capsule reorientation, SAR-RL achieves approximately 2x better performance (as a function of success rate) in the same number of timesteps. This result provides a proof of concept that SAR acquired from reorientation of one geometry can be successfully used to efficiently and effectively learn successful orientation of other geometries.
RL-E2E: capsule reorientation
RL-E2E: cylinder reorientation
SAR-RL: capsule reorientation
SAR-RL: cylinder reorientation