Natural and Robust Walking using Reinforcement Learning without Demonstrations in High-Dimensional Musculoskeletal Models

Overview

Humans excel at robust bipedal walking in complex natural environments. In each step, they adequately tune the interaction of biomechanical muscle dynamics and neuronal signals to be robust against uncertainties in ground conditions. We aim to reproduce these capabilities with reinforcement learning agents, but demonstration-driven techniques can be brittle and generalize badly to new situations. For this reason, we have leveraged evolutionary priors and biologically plausible objectives in order to learn natural and robust walking without demonstrations. We achieve energy-efficient walking with minimal hyperparameter tuning and show that our policies are extremely robust. As humans need to perform control while dealing with an incredibly complex world, we performed this study with high-dimensional and biomechanically accurate models. The combination of novel RL methods with the advent of computationally efficient simulation engines allowed us to create robust feedback policies, controlling each muscle separately without any simplifications.

Simulation Engines

The Hyfydy and MuJoCo simulation engines differ in these key areas (see paper for references):

Musculotendon dynamics: The muscle model in Hyfydy is based on Millard at el. [59] and includes elastic tendons, muscle pennation, and muscle fiber damping. The MuJoCo muscle model is based on a simplified Hill-type model, parameterized to match existing OpenSim models [42], and supports only rigid tendons and does not include variable pennation angles.
Contact forces: Hyfydy uses the Hunt-Crossly [60] contact model with non-linear damping to generate contact forces, with a friction cone based on dynamic, static, and viscous friction coefficients [61]. MuJoCo contacts are rigid, with a friction pyramid instead of a cone, and without separate coefficients for dynamic and viscous friction.
Contact geometry: The MuJoCo model uses a convex mesh for foot geometry, while in the Hyfydy models the foot geometry is approximated using three contact spheres.
Integration: Hyfydy uses an error-controlled integrator with variable step size, while MuJoCo uses a fixed step size and no error control. The average simulation step size in Hyfydy is around 0.00014s (7000 Hz) for the H2190 model, compared to the fixed MyoSuite step size of 0.001s (1000 Hz) for the MyoLeg model.

[42] V. Caggiano, H. Wang, G. Durandau, M. Sartori, and V. Kumar, “Myosuite – a contact-rich simulation suite for musculoskeletal motor control,” https://github.com/facebookresearch/myosuite, 2022. [Online].

[59] M. Millard, T. Uchida, A. Seth, and S. L. Delp, “Flexing computational muscle: modeling and simulation of musculotendon dynamics.” Journal of biomechanical engineering, vol. 135, no. 2, p. 021005, feb 2013.

[60] K. H. Hunt and F. R. E. Crossley, “Coefficient of Restitution Interpreted as Damping in Vibroimpact,” Journal of Applied Mechanics, vol. 42, no. 2, p. 440, jun 1975.

[61] M. a. Sherman, A. Seth, and S. L. Delp, “Simbody: multibody dynamics for biomedical research,” Procedia IUTAM, vol. 2, pp. 241–261, jan 2011.