Appendix

Reward Function Design

As mentioned in section IV of our paper, we explain our reward function in detail here. Readers can also refer to our github repo. Our reward function takes the following general format. We denote this general form as the original energy-adaptive reward.

Reward Function for ANYmal-C

The training environment and reward function for ANYmal-C are inherited from Legged Gym. We only trained the policy on plane ground, not on the complicated terrains. Our project used the original energy-adaptive reward.

This reward function is much simpler in comparison to legged gym, where many customized reward functions such as feet-air-time are added to regularize the quadruped robot behavior.

Reward Function for Unitree Go1

The training environment and reward function for Go1 are inherited from Walk these ways. We found that the original energy adaptive reward alone is insufficient to regularize the quadruped robot behavior. Therefore, we multiplied an auxiliary reward to the original energy adaptive reward. The following lists the components of the auxiliary rewards.

In comparison to walk-these-walk, we used most features from the fixed auxiliary rewards, which mainly considers safety and motor capability issues. However, nothing from the augmented auxiliary rewards was added, because they are gait-specified, and we want the quadruped robot to choose the gait automatically via energy regularization.

These rewards are all negative, and we define our auxiliary reward with the following format.

We multiply this auxiliary reward to the original energy-adaptive reward to get