Learning-Based MPC for Millirobot Swarming

Kamigami: the small, mobile, bio-inspired hexapedal millirobot used in this work, shown with a Raspberry Pi Zero for communication and control.

Overview

Research in swarm robotics can be prohibitive due to the cost of both the robots and the environments in which they are tested. In this project, we designed and implemented an inexpensive terrestrial swarm robotics platform using Model Predictive Control and Model-Based Reinforcement Learning for cooperation among Kamigami robots. Our experiments demonstrate that our setup can effectively learn complex system dynamics and can be deployed as a control method for low-cost robots. This project can serve as a blueprint that other researchers can modify and customize for specialized purposes.

Implementation

Our MPC cost function was initially a simple MSE loss between the robot and target states. Still, to enable swarm behavior, we introduced a term representing an average of the MSEs between the current and surrounding robot states, encouraging the robots to stay close to each other while approaching the goal state.

We developed a simulator to approximate our real-world control/learning problem roughly. The simulation is a 100x100 continuous space that handles any number of agents. First, we model our system as quadratic in 2 control inputs to generate XY state changes. Then, zero-mean Gaussian noise is added to the state change, which is added to the current state.

use1.mp4

A simulation environment with multiple MPC agents exhibiting swarming/clustering behavior. The stars are the agents (n=4).

How it works

We chose the Kamigami as our physical system in this paper. The Kamigami is a small, mobile, bio-inspired hexapedal millirobot kit that mimics the way real creatures move in nature. We added a layer of communication and control by connecting a Raspberry Pi Zero. The Pi runs ROS and uses wifi for communication to send actuation signals to a 2-channel Motor Driver. Also, we used LiPo battery to power the robot system. The Kamigami prototype we built is light-weight with no extra actuators or sensors.

Our training and testing environment setup was done using ROS as a coordination layer that sends commands to the robot, collects data from the camera, and saves that data to a file to train the model offline. First, a local machine broadcasts camera images read by an ar tracking ROS package called ar-track-alvar to produce pose estimates via ar tags attached to each Kamigami. Then, the local device computes a MPC-generated action based on the pose and sends it to the Kamigami via a ROS service. Finally, the Kamigami responds with a timestamp of when the command was executed. For more accurate results and to overcome latency issues, we added explicit handling for delays by taking the average of a predetermined number of timestamps and captured data as way-points to allow us to capture full motion, so we can accurately train our model off-line.

Results

In Simulation:
We were able to generate near-optimal behavior via our learned dynamics and MPC for both single robot and swarm of robots.

In Physical World:

We trained the model successfully and used MPC to reliably get to the goal position promptly for a single robot.

IMG_0649.MOV

Video of a successful run of a trained model reaching a predetermined goal.

A graph showing how the learned dynamics model brings the robot to the goal over time in real life!

A graph showing how the learned dynamics model brings the robot to the goal over time in simulation compared with optimal planner.

Roadblocks

During our development of hardware and software for experiment setup, we faced a few challenges that slowed down the process.

First, hardware acquisition; the past few years has seen a global chip shortage that also impacted this project. The availability and shipping times for the hardware we needed, especially the raspberry pi, were delayed significantly.

Second, the raspberry pi turned out to be quite a difficult computer to work with. There are many subtle differences between a raspberry pi and a traditional computer that made it challenging to make the configuration that would allow us to have smooth software development and communication.

Last but not least is ROS. Although we chose ROS to simplify our software development process, we entered a few rabbit holes when using it. First, ar-track-alvar is a relatively old package that was designed for older versions of ROS, maintained on a volunteer basis, and has lackluster documentation on newer versions, meaning that issues with it are especially difficult to debug. Also, we ran into versioning issues where some of the packages that ar-track-alvar depends on depends on some aspects of older versions of the C language that our personal computers were not compatible with. Second, ROS itself has relatively poor documentation that made it difficult to determine if the root of an issue was with the package we are using, our computer configuration, or ROS. These gave us such difficulty that we had to resort to using an older computer besides our own to develop the software stack. Also, these versioning issues made it difficult to create a uniform software stack that was compatible with both our laptop and the raspberry pi sitting on the Kamigami.

Conclusion

We explored the usage of reinforcement learning to learn the dynamics model of a nonlinear, underactuated, and under-observed milli-legged robotic system in order to develop a cheap platform for swarm robotics research. With less than 20 minutes' worth of training data on a hardwood surface, the robot was able to learn a model that allowed us to use MPC to reliably drive the robot to any arbitrary location in the arena that we chose. While the result is not perfect, it shows potential for a swarm robotics research platform that is inexpensive and easy to build and will allow more roboticists to conduct initial experiments to verify their swarming algorithms and iterate on improvements to them.

Future Work

Although our approach works in tame conditions (i.e. hardwood floor in a room), we were not able to test it in a more rough environment with slopes, obstacles, and non-rigid ground. One potential research direction might be to conduct experiments with our approach to identify how the Kamigami behaves on a number of different surfaces and conditions and perhaps modify our method to be more robust to them. Also, perhaps the experiment setup can change to allow reliable data collection under those different circumstances. Our current approach assumes that the Kamigami would be traversing a 2D plane, but a more rough environment would take 3D space and also make it more difficult to use one static camera to track the pose of the Kamigami.

Our Team