Fikih Muhamad, Anak Agung Krisna Ananda Kusuma, Jae-Han Park, Jung-Su Kim*
Seoul National University of Science and Technology
Korea Institute of Industrial Technology
This letter proposes a control framework to enhance the robustness of a locomotion policy against uncertainties. It integrates a locomotion policy network with an inverse model deep disturbance observer (DoB) network and a deep state estimator network. The locomotion policy is trained to produce optimal actions, while the deep DoB estimates disturbances and the deep state estimator estimates the body’s linear velocities. All networks are trained under nominal conditions in IsaacGym. Subsequently, entire trained networks are transferred to Gazebo and a real robot with ROS2 to validate their robustness under uncertain conditions without additional tuning. In uncertain conditions, the robot experiences higher magnitude uncertainties than during training in nominal conditions. Furthermore, validation results show that the proposed control framework achieves the lowest velocity tracking and estimation errors compared to the baseline method. This emphasizes the effectiveness of the proposed control framework in improving locomotion policy robustness.
Proposed framework: enhancing the robustness of the locomotion policy with deep state estimator and deep disturbance observer
Deep state estimator is employed to accurately estimate privileged states, such as body linear velocity, even in uncertain conditions. Additionally, The deep DoB is designed to enhance the robustness of locomotion policies against external disturbances under uncertain conditions. The deep state estimator and the deep DoB estimator leverage a combination of Long Short-Term Memory (LSTM) networks and Multi-Layer Perceptrons (MLP).
Illustration of the proposed framework to enhance the robustness of the locomotion policy.
Contribution: Replacing the inverse model of the four legged dynamic system in Disturbance Observer with Deep Neural Network to estimate disturbances.
we construct deep DoB using a combination of LSTM and MLP network, and trained it in nominal condition such that the inverse model of the four-legged dynamic systems are able to estimate the action that is generated by the locomotion policy.
Replacing the inverse model of disturbance observer with deep neural network
The training diagram of the Deep Disturbance Observer
Comparison: locomotion controller equipped with deep DoB resulted in smaller velocity tracking error
We provide random commands within the range of [-1 m/s, 1 m/s] for x and y linear velocity and within the range of [-1.0 rad/s, 1.0 rad/s] for angular velocity. The graph shows that the locomotion controller equipped with Deep DoB that uses LSTM resulted in smaller velocity tracking error.
robot experiences a 200N lateral body force at steps 4K and 8K.
linear and angular velocity tracking error given random commands of the two velocities
Performance: combining deep-DoB with locomotion controller shows better robustness when encountering external disturbances in both simulation and real
Training setup
The optimal deep DoB and deep state estimator are achieved by training a combination of LSTM and MLP network to minimize the observation loss and estimation loss. A supervised learning approach is utilized until the optimal weights for each network are obtained.
PPO + Deeb Dob training process in Isaac Gym
Simulation
For the simulation test, we evaluate our proposed framework in Gazebo simulation environment and ROS2 with external disturbances including slippery surfaces, additional payload, and lateral body forces. The result shows that our framework is able to make the locomotion controller more robust to disturbances.
PPO + Deeb Dob performance in Gazebo simulation for various external disturbances
Real world
Experiment setup
Unitree Go1 quadruped robot is used, and is attached on a wheeled platform so to provide safety when the robot is unable to maintain its balance if the disturbance is too severe. for the weight disturbance, a foam box is attached on top of the robot with eight ankle weights inside of it each having 0.5 kg for a total of 4 kg. For side push disturbance, a rod made with aluminium profile with a foam at one end of the rod is used. For slippery disturbance, gloves made of cotton is attached on each end of the leg so to provide less traction.
Tools used for disturbances
Experiment setup
Real world test
For the real world test, we conducted three experiments which includes:
Additional load of 4 kg
Additional load of 4 kg with side push disturbance
Additional load of 4 kg, side push disturbance, and slippery condition
experiment 1
experiment 2
experiment 3
The three real world test conducted on the robot validates that the proposed control framework that integrates a locomotion policy with a deep disturbance observer and a deep state estimator enhance the robustness of locomotion policy performance. It has shown its ability to effectively manage external disturbances and sensor noise under uncertain conditions that is not encountered during training under nominal conditions.
Contact jungsu@seoultech.ac.kr to get more information on the project