October 16-17th 2024 | IROS 2024

The Earth Rover Challenge

Urban Robotic Navigation Competition In The Real World

Overview

5 AI Teams | 5 Human Gamers | 5 Cities

With the rise of generalist foundation models for robot navigation, new frontiers involving challenging open-world and open-vocabulary mobility scenarios are now of immense interest to the robotics community.

Meanwhile, testing such models in real world "outside the lab" settings have remained outside the operational capabilities of most research labs.

Our competition aims to explore the possibility of globally distributed in-the-wild navigation evaluations at an unprecedented scale, along with the release of a substantial real-world navigation dataset collected from 10+ cities (>2k hours).

Leveraging a large fleet of outdoor navigation robots deployed across multiple cities, we aim to study whether state of the art open-world autonomous navigation models are able to effectively operate in truly open-world settings and how they fare against human tele-operated performance under the same environments.

Possible test sites from FrodoBots' current global robot fleet locations

Auckland

Blekinge

Madrid

Taipei

Vienna

Wuhan

Real world footage (robot's front/back cameras) of human gamers completing navigation missions in various cities

Participating Teams

Competition Rules

The Earth Rover Challenge is a distributed competition across remote environment scenarios spanning multiple cities (e.g., Abu Dhabi, Singapore, Taipei, Stockholm, etc.), where competition participants need to deploy their policies into realistic GPS goal-oriented navigation scenarios. This competition will test the robustness, generalization, and safety of navigation capabilities of robot foundation models.

Navigation Missions

The goal of the competition is to remotely control small sidewalk robots in order to complete various navigation missions in outdoor urban environments across various cities in real time.

Each mission consists of a series of checkpoints that the robot needs to navigate through. A checkpoint is specified using GPS coordinates.

Every mission is given certain award points: 1-10 based on the difficulty, e.g., whether crossing roads, crowded space, and is considered “mission complete” when the robot returns to the end point of the given mission after registering at various checkpoints.

Example of a navigation mission in Singpaore (consisting of a sequence of GPS-defined checkpoints)

Competition Format

The competition will proceed in a round robin fashion where AI teams and human gamers will take turns to complete known and unknown navigation missions in various test locations/cities.

Concretely, there will be 2 robots at each location, with either robots controlled by AI team's model or a human gamer (it's also possible for both robots to be controlled by 2 AI team's models or 2 human gamers). Both robots will start at the same starting point, albeit with a few minutes difference in staggered starting time, while working towards completing the same navigation mission (with the same series of checkpoints).

Each mission will have a difficulty score ranging from 1 to 10. Example of a simple mission may include a short distance drive (eg. 100 meter) in a typically quiet park with wide sidewalk while a more difficult drive could involve a long drive (eg. 1 km) along crowded sidewalk while requiring crossing roads or even traveling directly on roads at times. It is also important to highlight that because this competition takes place in the wild, real world variability will naturally mean that the conditions of every drive may differ even for the same mission (eg. a typically quiet park getting crowded unexpectedly).

Successfully completing a given mission will earn the AI team or human gamer competition points that correspond to the difficulty score (ie. completing a level 1 mission will earn 1 point). Failure to complete a mission or needing any intervention by the "robot walker assistant" will mean the AI team or human gamer will not receive any point for that particular round/mission.

At the end of the competition, the AI team or human gamer with the highest aggregate points will win the competition. If there is a tie (in points), we will refer to the particular round where the 2 opponents faced each other in the same mission and whoever completed that mission sooner will be considered the ultimate winner.

Robot Platform

Each Earth Rover unit weighs less than 5 kg (11 lbs) and moves at a max speed of ~3 km/hr (~0.85 m/s). It is able to move forward/backwards, turn in-place and comes equipped with front/back cameras, 4G connection, GPS & IMU. It has limited edge computing and is meant to be 100% remotely controlled by human drivers or AI navigation models hosted on a remote server (e.g., your in-house compute, or the cloud).

Every participating team will be given 2 Earth Rover units for testing locally as well as up to 20 hours test time per week (in the coming months leading to the actual competition at IROS) with robots deployed remotely around the world (along with human operators who will follow closely behind the robots to provide real-time operational support to the participating teams).

FrodoBot unit & human drivers’ POV

Navigation Models Deployment

Competing teams shall host their own models in their own compute facilities while remotely accessing the assigned robots via a standard Remote Access SDK (GitHub Repo here).

Effectively, AI team's model will receive video stream from the front camera of the robot while it can also send through a control data stream to the robot. In addition, GPS location of the robot, as well as other specific info related to a given navigation mission (eg. GPS of the next checkpoint, neighborhood map) will also be provided.

Dataset

FrodoBots has also open-sourced a significant dataset of human tele-operated drives collected from 10+ cities (>2k hours), which the teams can opt to adopt as part of their models training pipeline.

There are 7 types of data that are associated with a typical Earth Rovers drive, as follows:

Control data: Gamer's control inputs captured at a frequency of 10Hz (Ideal) as well as the RPM (revolutions per minute) readings for each of the 4 wheels on the robot.
GPS data: Latitude, longitude, and timestamp info collected during the robot drives at a frequency of 1Hz.
IMU (Inertial Measurement Unit) data: 9-DOF sensor data, including acceleration (captured at 100Hz), gyroscope (captured at 1Hz), and magnetometer info (captured at 1Hz), along with timestamp data.
Rear camera video: Video footage captured by the robot's rear-facing camera at a typical frame rate of 20 FPS with a resolution of 540x360.
Front camera video: Video footage captured by the robot's front-facing camera at a typical frame rate of 20 FPS with a resolution of 1024x576.
Microphone: Audio recordings captured by the robot's microphone, with a sample rate of 16000Hz, channel 1.
Speaker: Audio recordings of the robot's speaker output (ie. gamer's microphone), also with a sample rate of 16000Hz, channel

More information about the currently released dataset can be found here: https://huggingface.co/datasets/frodobots/FrodoBots-2K

Models Testing Phase

In the months leading up to the actual competition, participants can test out their models on open-world sites with pre-defined navigation missions (“seen environments”). During the actual competition, participants will attempt to complete navigation missions in both seen and unseen environments across at least 4 open-world sites.

Observation space: The robot will have access to a front-facing camera view that will be updated at roughly 20 Hz, depending on the network connection. Depending on network conditions, the latency of the streaming data will be around 500 milliseconds.

Action space: The robot will be able to move forwards and backwards, or turn left and right. More details on these actions can be found in the documentation of the Remote Access SDK.

Success criteria: The robot is deemed to have successfully reached the next checkpoint if it comes within 15 meters of that point, allowing for the tolerance of noisy GPS data.

Operation Support: In the months leading up to the competition date, we will provide locations across multiple citis (e.g., parks, campuses, public sidewalks) for teams to test their model in the real world, remotely. Teams can expect up to 20 hours per week of testing time, and a human "bot walker" will be following their robots to provide real-time ops support. The actual challenge at the conference will be similar, except new locations that the teams have not seen will be added.

Human Performance Benchmark

During the competition, 5 human drivers (winners selected from humans-only tournaments held before The Earth Rover Challenge) will also attempt to complete the same missions alongside robots controlled by other participants’ models. The human drivers are subject to the same conditions of “seen” vs “unseen” environments. This will help to form the human performance benchmark.

Competition Schedule

We will have multiple missions, most of which are done remotely in other cities and other sites in Abu Dhabi as well. The competition consisting of all missions will be conducted in sequence over the course of 2 days, and live-streamed on various social media channels for a global audience to spectate in real time.

8:30 AM - 9:00 AM

9:00 AM - 10:00 AM

10:00 AM - 11:00 AM

11:00 AM - 12:00 PM

12:00 PM - 1:30 PM

1:30 PM - 2:00 PM

2:00 PM - 3:00 PM

3:00 PM - 4:00 PM

4:00 PM - 5:00 PM

8:30 AM - 9:00 AM

9:00 AM - 10:00 AM

10:00 AM - 11:00 AM