As mentioned in section IV of our paper, we explain our reward function in detail here. Readers can also refer to our github repo. Our reward function takes the following general format
The training environment and reward function for Go1 are inherited from Walk these ways. We found that the original energy adaptive reward alone is insufficient to regularize the quadruped robot behavior. Therefore, we multiplied an auxiliary reward to the original energy adaptive reward. The following lists the components of the auxiliary rewards.
In comparison to walk-these-walk, we used most features from the fixed auxiliary rewards, which mainly considers safety and motor capability issues. However, nothing from the augmented auxiliary rewards was added, because they are gait-specified, and we want the quadruped robot to choose the gait automatically via energy regularization.
These rewards are all negative, and we define our auxiliary reward with the following format.
We multiply this auxiliary reward to the original energy-adaptive reward to get
The training environment and reward function for ANYmal-C are inherited from Legged Gym. We only trained the policy on plane ground, not on the complicated terrains.
This reward function is much simpler in comparison to legged gym, where many customized reward functions such as feet-air-time are added to regularize the quadruped robot behavior.