Learning Flexible and Reusable Locomotion Primitives for a Microrobot

Brian Yang, Grant Wang, Roberto Calandra, Daniel Contreras, Sergey Levine, Kristofer Pister

Welcome to our site on our research titled: Learning Flexible and Reusable Locomotion Primitives for a Microrobot. Below you will find: (1) our research abstract as well as link to our paper published in the IEEE Robotics and Automation Letters, (2) a short summary video of our project (3) the curriculum of experimental simulations we conducted on our microrobot, and (4) the GitHub link to our publicly available code (you can try out our curriculum on your local machine).


The design of gaits for robot locomotion can be a daunting process which requires significant expert knowledge and engineering. This process is even more challenging for robots that do not have an accurate physical model, such as compliant or micro-scale robots. Data-driven gait optimization provides an automated alternative to analytical gait design. In this paper, we propose a novel approach to efficiently learn a wide range of locomotion tasks with walking robots. This approach formalizes locomotion as a contextual policy search task to collect data, and subsequently uses that data to learn multi-objective locomotion primitives that can be used for planning. As a proof-of-concept we consider a simulated hexapod modeled after a recently developed microrobot, and we thoroughly evaluate the performance of this microrobot on different tasks and gaits. Our results validate the proposed controller and learning scheme on single and multi-objective locomotion tasks. Moreover, the experimental simulations show that without any prior knowledge about the robot used (e.g., dynamics model), our approach is capable of learning locomotion primitives within 250 trials and subsequently using them to successfully navigate through a maze.

Video Summary

Videos of Curriculum

(1) Learning to Walk Straight (Bayesian Optimization)

We implemented various well-studied hexapod gaits found in nature (Dual Tripod, Ripple, Wave, and Fourtwo). Below is a clip of our microrobot running a Dual Tripod gait. To optimize and produce these gaits, we used a CPG controller modeled using coupled non-linear oscillators. In order to tune the parameters of our CPG, we used Bayesian optimization over a limited number of iterations. We performed both single-objective and multi-objective optimization on these gaits.

(2) Discovering New Gaits

We also used multi-objective optimization in order to explore the performance of the CPG without pre-defined gaits. In this task, we used the coupling between the oscillators of our CPG network as additional parameters during the optimization. The below clip is one of many gaits we found to out-perform the pre-defined gaits in terms of speed.

(3) Learning to Walk Inclines (Contextual Optimization)

Next, we used contextual Bayesian optimization to train the robot to walk up inclines efficiently. We trained the robot using 5,10, and 15 degree inclines over 50 iterations. The contextual optimizer can leverage prior simulations to obtain high-performing gaits in fewer simulations. Below is a video of the training process.

(4) Learning to Curve (Contextual Optimization)

We also used contextual optimization to train the robot to learn locomotive primitives. We had our robot train on five target trajectories (each the same distance from the initial point but spaced out equally) over 250 iterations. The contextual optimizer can leverage prior simulations to obtain high-performing gaits in fewer simulations. We found that the robot was able to fairly consistently learn the optimal parameters for these trajectories, and could generalize to intermediary trajectories.

(5) Maze Navigation (Combining Contextual and MOO)

Finally, we tackle the problem of path planning. We reuse the evaluations from the previous contextual trajectory experiments and reformulate the task as a multi-objective optimization. By constructing a model that can learn to map parameters to predicted trajectories, we are able to sample candidate solutions to produce desired trajectories. By chaining these trajectories together, we are able to navigate through a maze. The below clip is one such example of a successful maze navigation.