Reproducing 'Sounds Good'


Karimian et al. describe how to use audio signals for communication between multiple robots in their paper Sounds Good: Simulation and Evaluation of Audio Communication for Multi-Robot Exploration. Audio is typically used for robot-to-environment or robot-to-human interaction (eg, buzzers for auditory status codes), but is seldom used for robot-to-robot interaction. They claim that audio communication can improve performance of autonomous multi-robot exploration tasks, and provide empirical evidence using simulations. For this project, our goal was to mostly re-create their simulation to the best of our ability.

The original paper sets up an exploration task for 5 autonomous robots. Their goal is for their robot team, in total, to complete 20 jobs. A job is defined as finding a source beacon, performing work for 30 seconds, finding a sink beacon, and performing work for 30 seconds. The robots do not know the environment to start; they perform relative/local occupancy grid mapping. The robots use frontier-based searching to determine where in the world to explore next. The authors also describe that the source and sink beacons may change locations every so often.

The authors' simulator, Player/Stage, did not include any form of audio signal processing, so the authors wrote their own audio propagation model. The authors assume that audio signals always traverse the shortest visibility path from source to destination. This is modeled using a visibility graph, where vertices represent map locations and edges represent an unobstructed path and have Euclidean-distance weight. A robot will request to send an "audio signal" (realistically, by playing sounds through a speaker) and the audio propagation model will run Dijkstra's algorithm on the visibility graph to calculate the shortest path between the sender and each receiver robot (realistically, each robot would detect the audio signal through a microphone).

The authors test whether audio communication improves performance of the aforementioned task. The authors do this by timing their simulation in 10 different configurations 20 times. The authors perform this set-up under three conditions -- no audio communication, simple left-right audio detection, and omni-directional audio detection. They find statistically significant differences between audio and no-audio conditions across configurations -- the robots completed the task faster when audio communication was allowed. The authors found no difference between bi-directional and omni-directional audio communication.


For the purposes of this project, we make a number of hefty simplifications.

  • Our robots do not keep a local occupancy map, nor do they perform algorithmic exploration

  • Our robots have four microphones to "detect" sound from each major cardinal direction

  • Our robot has only frontward sonar field-of-view

  • We utilize breadth-first search on the global occupancy map to determine the shortest path, and receiver direction, between the sender and receiver

  • We utilize the "cave" map instead of the "hospital" map

  • Our "task" is to simply find the beacon

For our simulations, we utilize Player and Stage, like the paper, to manage robot controls and environment sensing respectively. As in the paper, we utilize 5 robots. Our robots are equipped with a range-finder (sonar) and a blob-finder (color camera). The range-finder queries from the front of the robot, and the blob-finder only queries for blue colors. The beacon is the only blue object in the map.

For our audio propagation model, we implement a custom opaque-cmd Player driver to handle custom message passing between robots. Each robot subscribes to the opaque proxy. When a robot senses the beacon in its blob-finder, it moves towards the beacon and makes a request to the opaque proxy. This mimics the "speaker". This request is a packet containing the robot's ID and global map position. The robot is sending this global position, but we do not make use of that position for any navigation/exploration -- it is unknown to the robot. Once the opaque driver receives an "audio signal", then it broadcasts that same packet back to all 5 robots. This mimics the "microphone". Then, each robot runs breadth-first search from the received global position to its own global position. We use this result solely to determine which direction -- north, east, south, or west -- the "audio signal" came from. Again, we do not use the robot's global position for any other purpose -- it is unknown to the robot. Also note that we cannot use a simple straight-line direction because the environment might be obstructing the path.

We test two conditions -- no audio communication and audio communication. To test this, we use one simple configuration with the robots scattered across the cave map and the beacon statically near the map center. The robots wander around until they see the beacon, then they drive to the beacon. Or, they crash. The simulation runs until all robots are at the beacon or have crashed. We record the number of bots that found the beacon, and the total simulation time. We run 5 simulations for both the no-audio and the audio conditions.


Our results are promising that a true replica of this paper would yield identical results. In our experiments, robots that used audio communication performed significantly better -- more robots found the beacon in less time.

We include two recordings from our experiments, as well as our raw data, below. In the recordings, it is clear that audio signaling significantly improves navigation. There is a clear distinction when a robot senses the beacon and starts "transmitting audio". Our data could be used to make a simplified version of figure 6 in the original paper.


Example simulation with no audio communication.


Example simulation with audio communication.


Here is our raw data from our experiment recordings. "# bots" represents the number of bots that found the beacon, and the "time (sec)" represents the total time for all robots to either crash or find the beacon.


We faced many challenges while reproducing this paper. First, we decided to use the Player/Stage software stack. Player has mostly been unused and unmaintained for 10 years. We think most researchers use ROS/ROS2 and Gazebo for simulation. Luckily, we succeeded at building Player and Stage. Second, Player and Stage both have poor documentation and a limited support base. This is common with more research-oriented code-bases though, so we were not surprised. Jenny Owen's tutorial was very helpful. Furthermore, in the original paper, the authors stated that they would further develop and test their audio propagation model to make it available open-source. No such software has been made available in the 14 years since this paper was published. As such, we had to learn how to make a custom Player driver to handle "audio signals" -- this was difficult. Third, developing good navigation and exploration code is hard, and incorporating Player drivers that handle navigation/exploration is hard. As such, we resorted to compiling simple control code that wanders the environment and avoids obstacles as best as possible. Almost certainly, our results would have been identical to the paper's if our navigation/exploration code was frontier-oriented. Lastly, our robots tended to get stuck in the environment. Given better navigation and exploration code, this may have been mitigated. The authors in the original paper noted that 5 robots were nice to have, since if one got stuck the other robots could still perform their jobs.


Our code can be found in the following repository:


  • Clifford Bakalian

  • Justin Goodman