FLD: Fourier Latent Dynamics for Structured Motion Representation and Learning

Chenhao Li, Elijah Stanger-Jones, Steve Heim, Sangbae Kim

Biomimetic Robotics Lab, Massachusetts Institute of Technology, United States

Spotlight, ICLR 2024

Abstract

Motion trajectories offer reliable references for physics-based motion learning but suffer from sparsity, particularly in regions that lack sufficient data coverage. To address this challenge, we introduce a self-supervised, structured representation and generation method that extracts spatial-temporal relationships in periodic or quasi-periodic motions. The motion dynamics in a continuously parameterized latent space enable our method to enhance the interpolation and generalization capabilities of motion learning algorithms. The motion learning controller, informed by the motion parameterization, operates online tracking of a wide range of motions, including targets unseen during training. With a fallback mechanism, the controller dynamically adapts its tracking strategy and automatically resorts to safe action execution when a potentially risky target is proposed. By leveraging the identified spatial-temporal structure, our work opens new possibilities for future advancements in general motion representation and learning algorithms.

FLD

In this work, we introduce Fourier Latent Dynamics (FLD), a self-supervised, structured representation and generation method that extracts spatial-temporal relationships in periodic or quasi-periodic motions.

Overview

FLD extends the Periodic Autoencoder (PAE) structure to obtain enhanced temporal consistency by propagating the latent parameters through time. During training, latent dynamics are enforced to predict proceeding latent states and parameterizations. The prediction loss is computed in the original motion space with respect to the ground truth future states.

Motion Representation

In the first stage, an efficient representation model is trained on the reference dataset and a continuously parameterized latent space is obtained where novel motions can be synthesized by sampling the latent encodings.

Latent Manifold

FLD presents the strongest spatial-temporal relationships with explicit latent dynamics enforcement. PAE witnesses a similar but weaker pattern with local sinusoidal reconstruction. In comparison, VAE enables only spatial closeness, and the trajectories of the original states are the least structured. Powered by latent dynamics, FLD offers a more compact representation of high-dimensional motions by employing time indexing alongside global features.

Motion Synthesis

The superiority of FLD is especially pronounced in long-term prediction regions, where the other models accumulate significantly larger compounding errors. The effectiveness of FLD in accurately predicting motion for an extended horizon is attributed to the latent dynamics enforced with an appropriate latent dynamics propagation horizon.

Latent Interpolation

The enhanced generality achieved by FLD is attributed to the well-shaped latent representation space, where sensible distances between motion patterns are established. Among step in place, forward run, and forward stride, the parameterization of the intermediate motion (forward run) is distributed in between.

Motion Learning

The second stage involves developing an effective learning algorithm that tracks the generated diverse target trajectories.

Training

During training, the latent states propagate under the latent dynamics and are reconstructed to policy tracking targets at each step. The tracking reward is computed as the distance between the target and the measured states.

Inference

During the inference phase, the policy structure incorporates real-time motion input as tracking targets, irrespective of their periodic or quasi-periodic nature. The latent parameterizations of the intended motion are obtained online using the FLD encoder.

Fallback Mechanism

With a novel fallback mechanism enabled by FLD, the learning agent is able to dynamically adapt its tracking strategy and automatically reject and resort to safe action execution when a potentially risky target is proposed.

Evaluation

We evaluate FLD on the MIT Humanoid robot in simulation and demonstrate its applicability to state-of-the-art real-world robotic systems.

Real-time Motion Tracking

The fallback mechanism identifies risky tracking targets and defaults them back to the safe counterparts by propagating the latent dynamics.

Motion Transitions

The understanding of the spatial-temporal relationships by the latent parameterization space accurately describes motion transitions simply with parameter interpolations.

Fallback Ablation

Without the fallback mechanism, the system attempts to execute motions that are far from its training distribution. This leads to unpredictable or unsafe robot behavior, as the system lacks prior knowledge or experience with these types of inputs.

Ablation Studies

Our experiment provides strong evidence that the motion learning policy informed by the latent parameterization space effectively achieves motion inbetweening and coherent transitions that encapsulate high-level behavior migrations. This naturally motivates the design of skill samplers that can continually propose novel training targets in addition to the reference motions, so as to enhance tracking generality onto a wider range of motions.

OFFLINE

Sampling within the latent parameterization of the reference dataset.

GMM

Sampling from a parameterized distribution of the latent parameterization of the reference dataset.

RANDOM

Sampling randomly in the latent parameterization space.

ALPGMM

Sampling from a parameterized distribution that adaptively captures high learning-progress targets online, following the ALP-GMM algorithm.

Tracking Performance

Both OFFLINE and GMM demonstrate strong tracking capabilities on the prescribed dataset. RANDOM achieves comparable tracking performance despite having a broader sampling coverage in the latent parameterization space and lacking access to the offline data or knowledge of the data structure. In contrast, ALPGMM exhibits an enlarged motion coverage, as evidenced by the continually increasing exploration factor. This expanded coverage leads to a gradual drop in tracking performance, which can be attributed to exploring potentially under-explored and under-defined motion regions.

Unlearnable Subspaces

When the latent parameterization space is not corrupted by unlearnable motions, RANDOM achieves comparable tracking performance as OFFLINE and GMM. However, when the reference dataset contains unlearnable motions, these samplers quickly degrade or fail due to the distortion of the latent parameterization space toward these challenging areas. On the other hand, ALPGMM adapts its sampling distribution to focus on high ALP regions, resulting in the highest tracking performance among all skill samplers, particularly in scenarios heavily affected by unlearnable components.

Latent Migration

At the beginning (200 iterations) of learning, motion targets are initialized randomly. The policy is still undertrained at this early stage and thus tracks motions with generally low performance. As the learning proceeds (1000 iterations), the policy gradually improves its capability on the learnable motions indicated by the increased performance and ALP measure on the colored samples. In the meantime, the policy also recognizes the unlearnable region (grey) where it fails to achieve better tracking performance and remains a low learning process. As ALPGMM biases its sampling toward Gaussian clusters with high ALP measure, the cluster on the unlearnable motions becomes silent, indicated by an ellipsoid with complete transparency.

Further training (10000 iterations) enlarges the coverage of tracking targets indicated by the expanded sampling range centered around the mastered region. A gradient pattern in tracking performance and ALP measure is observed where higher values are achieved at points closer to the confidence region. Finally, more extended training time (20000 iterations) pushes ALPGMM to areas more distant from the initial target distribution, motivating the policy to focus on specific motions where the performance may be further improved.

This demonstrates the efficacy of ALPGMM in navigating the latent parameterization space to focus on learnable regions.

Page updated

Google Sites

Report abuse