No Randomisation, Kp=15 and Kd=1
No Randomisation, the robot was trained and tested with Kp=15 and Kd=1
When trained is simulation without any form of randomisation to account for the sim-to-real gap the robot fails 3 times in a minute and a half.
Roll-Drop, Kp=15 and Kd=1
Roll-Drop was adopted to account for observation noise, the robot was trained and tested with Kp=15 and Kd=1
When Roll-Drop is included we can run the controller indefinitely without failures, in the video on the left more than four minutes.
Training and deployment mismatch
No Randomisation, Kp=20 and Kd=1
No Randomisation, the robot was trained in simulation with Kp=20 and Kd=1 and tested with Kp=15 and Kd=1.
The policy trained in simulation without any sim-to-real transfer technique is not able to account for the sim-to-real gap.
Roll-Drop, Kp=20 and Kd=1
Roll-Drop was adopted to account for observation noise, the robot was trained in simulation with Kp=20 and Kd=1 and tested with Kp=15 and Kd=1.
The change in gains highlights a case of system uncertainty and demonstrates that Roll-Drop is capable of addressing sim-to-real gaps.