Linear Policies are Sufficient to Enable Low-cost Quadrupedal Robots to Traverse Rough Terrain

Maurice Rahme, Ian Abraham, Matthew L. Elwin, and Todd D. Murphey

Abstract

The availability of inexpensive 3D-printed quadrupedal robots motivates the development of learning-based methods compatible with low-cost embedded processors and position-controlled hobby servos. In this work, we show that a linear policy is sufficient to modulate an open-loop trajectory generator to enable a quadruped to walk over rough, unknown terrain. The policy is trained in simulation using randomized terrain and dynamics and directly implemented on the robot. The resulting controller can be implemented on resource-constrained systems. We demonstrate the results by deploying the policy on the OpenQuadruped, an open-source 3D-printed robot equipped with hobby servos and an embedded microprocessor.

drgmbc.pdf

DR-GMBC: At a Glance

DR-GMBC builds on an existing gait scheme using 12-point Bezier curves which we modify to allow for any combination of forward, lateral, and yaw commands at user-defined step heights, lengths, and speeds. The method wraps a learning agent around this scheme to modulate gait parameters such as step and body height, and to add significant residuals to the resultant foot coordinates. The only sensor used here is an IMU.

Output from Bezier Gait Generator

At the core of this method is the way we wrap the agent around our user input. The genericity of this approach allows the agent to be used with unseen commands. The basic adaptation of the Bezier curve generator gives 2D foot coordinates over time: horizontal and vertical. In section V of the paper, we describe our method for extending the trajectories into 3D. For lateral motion, we simply rotate the 2d trajectory relative to the robot z axis.

Bezier Controller Algorithm

GMBC Control Scheme

For yaw motion, we use the intuition that for clockwise yaw, all four feet must trace a counter-clockwise circle where both front feet should move towards the rear-left, and both back feet should move towards the rear-right of the robot during the stance phase. To make sure the path is circular, the directional vector of each foot path is modulated by the change in foot xy magnitude relative to the stance phase at each iteration. This approach allows us to add the coordinates generated by the forward, strafe, and yaw gait generators to seamlessly create mixed motions with intuitive controller inputs.

Environment & Training

We built an open-source simulation environment with a URDF based on our custom quadruped for training and validation. The simulation features full state randomization, including terrain mesh, body mass, and foot friction resampling at each epoch. The environment contains a simulated IMU, and an optional set of simulated contact sensors. In terms of environment debugging tools, users have the ability to trace the path of each foot in real time, which aids in evaluating agents and in crafting gaits for agents to modulate. The terrain mesh is also customizable, with the default terrain height being at up to 40% of body height for our URDF. Here is the github repository for the paper.

We tried training with and without full state randomization, and as expected, found that the non-randomized version earned higher rewards over time, as it was implicitly learning the simulated dynamics and specific terrain. In a series of 1000 randomized tests, the agent trained with full state randomization survived for longer, but both performed much better than the open-loop gait. An agent trained with randomization walked 27x father than without training.

50-Epoch Moving Average GMBC Agent Training.

1000-trial forward motion test results.

Validation

To examine the efficacy of our method, we first plotted the policy output for flat and rough terrain. On flat ground, there are clear repeated patterns that - as expected - disappear once we deploy our trained agent on rough terrain. Finally, we ran a series of sim-to-real experiments with no modification to the agent whatsoever to evaluate performance.

Policy Output on Flat Ground.

Policy Output on Rough Terrain.

Experiment 1: 2.2m Track Traversal with Open Loop Gait on the left and DR-GMBC on the right.

This test involved traversing a 2.2m track from figure covered with loose stones whose heights ranged between 10mm and 60mm - up to 30% of the robot’s standing height - where the peak height of the track occurs between 1.6m and 2m. The video below shows a small sample of the results. In total, we did 14 takes with the open-loop gait, and 10 takes with our algorithm. The table on the right summarizes the mean traveled distance and success rate (when the robot did not fall). The 40% distance improvement metric should be viewed under a lens that considers the 4.28x fall rate improvement.

Experiment 2: Descent Test with Open Loop Gait on the left and DR-GMBC on the right.

The second test has the robot descend from the peak of loose stones at 60mm height (30% of robot height) onto flat ground. The robot teleoperated using our algorithm fell 3 out of 11 times and the open-loop gait fell 9 out of 13 times, showing a 2.5x improvement rate. Feel free to loop through these galleries.

Experiment 3: Flat Ground Performance Test with Open Loop Gait on the left and DR-GMBC on the right.

To show the system’s practicality, the final test involved walking forward and strafing around a 1m×1m block with no obstacles to identify potentially lost performance. Note that the time taken for the adjustment movements needed to line the robot up with its movement direction (forward, backward or strafe) are not considered in the velocity calculation. The aggressive walking stance demanded too much torque from the motors, which caused the robot’s rear to dip during backwards motion, and hence trigger damping behavior from the agent in anticipation of a fall, ultimately slowing the robot for this type of motion by 57.6%. Left and right strafing speed increased by 11.5% and 19.1% respectively, while forward speed remained the same