Learning with Muscles: Benefits for Data-Efficiency and Robustness in Anthropomorphic tasks
Isabell Wochner*, Pierre Schumacher*, Georg Martius, Dieter Büchler, Syn Schmitt, Daniel F.B. Haeufle
Isabell Wochner*, Pierre Schumacher*, Georg Martius, Dieter Büchler, Syn Schmitt, Daniel F.B. Haeufle
Humans are able to outperform robots in terms of robustness, versatility, and learning of new tasks in a wide variety of movements. We hypothesize that highly nonlinear muscle dynamics play a large role in providing inherent stability, which is favorable to learning. While recent advances have been made in applying modern learning techniques to muscle-actuated systems both in simulation as well as in robotics, so far, no detailed analysis has been performed to show the benefits of muscles when learning from scratch. Our study closes this gap and showcases the potential of muscle actuators for core robotics challenges in terms of data-efficiency, hyperparameter sensitivity, and robustness.
Smooth Point-Reaching (OC)
A smooth point-reaching motion with the arm26 model is compared with muscle actuators and torque actuators. The green lines in the muscle-actuated motion represent the force exerted by the muscles, the larger the line, the larger the muscle force.
Squatting (OC)
A squatting motion using a reduced FullBody model in 3D is shown. The muscle-actuated motion seems more stable towards the endpoint: This can especially be seen due to the smaller swinging motion of the freely movable arms.
Hitting a ball with high velocity (OC)
The goal was to hit the falling ball with a high-velocity in z-direction (against gravity). In both cases, the controller learns to wait shortly at the beginning until the arm can hit the ball in the right position at the optimal time.
High-Jumping (OC)
A high-jumping motion using a reduced FullBody model in 3D is shown. The muscle-actuated motion seems more physiological because it is less likely that the joint limits are reached (see pelvis motion of torque-actuated case at the beginning and end of movement).
Point-reaching with perturbation (MPC)
An unknown weight of 1 kg was added to the lower arm. This perturbed motion is shown in comparison to the unperturbed motion (overlayed transparently). It can be seen that in both cases the controller is still able to perform the motion, although it is slower compared to the original motion. The deviation to the desired endpoint at the end of the movement is larger in the torque-actuated case, compared to the muscle-actuated motion.
Squatting with perturbation (MPC)
An unknown perturbation force of -50 N was added to the hip angle in the middle of the motion. In the muscle-actuated case, almost no effect is seen, whereas the controller struggles to keep the model upright in the torque-actuated case.
Precise point-reaching (RL)
Chaotic load point-reaching (RL)
Hopping (RL)
OC: Optimal control
MPC: Model predictive control
RL: Reinforcement learning