IROS 2024 Best Paper Finalist (Top 0.1%)
Cloud robotics enables robots to offload complex computational tasks to cloud servers for improved performance and easier management. However, cloud computing can be costly, cloud services may experience occasional downtime, and the connection between the robot and the cloud can be affected by fluctuations in network Quality-of-Service (QoS). To address these challenges, we present FogROS2-FT (Fault Tolerant), a multi-cloud extension that automatically replicates independent stateless robotic services, routes requests to these replicas, and returns the first response. This replication allows robots to benefit from cloud computations even when a cloud service provider experiences downtime or there is low QoS. Additionally, many cloud providers offer low-cost "spot" computing instances that may shut down unpredictably. Normally, these instances would not be suitable for cloud robotics, but FogROS2-FT's fault-tolerant design enables reliable use of such resources. We demonstrate FogROS2-FT’s fault tolerance capabilities in three simulated cloud-robotics scenarios—visual object detection, semantic segmentation, and motion planning—and one physical robot experiment involving scan-pick-and-place. Using the same hardware specifications, FogROS2-FT achieves up to a 2.2x cost reduction for motion planning and up to a 5.53x reduction in 99th percentile (P99) long-tail latency. For object detection and semantic segmentation, FogROS2-FT reduces P99 long-tail latency by 2.0x and 2.1x, respectively, under conditions of network slowdown and resource contention.
Paper Draft: Link
https://github.com/KeplerC/FogROS2-FT
All Cloud Robotics Application Code: https://github.com/BerkeleyAutomation/fogros-realtime-examples
All raw logs to generate paper figures: https://github.com/BerkeleyAutomation/fogros-rt-results
The original FogROS2-FT focuses on replicated compute resources, while PLR focuses on replicated network interfaces.
Combining FogROS2-FT and FogROS2-PLR may achieve Probabilistic Latency Reliability (Real Time by some definitions) Cloud Robotics.
@inproceedings{chen2023fogrosft,
title={Fault },
author={Chen, Kaiyuan and Hari, Kush and Chung, Trinity and Wang, Michael and Tian, Nan and Juette, Christian and Ichnowski, Jeffrey and Kubiatowicz, John and Stoica, Ion and Goldberg, Ken},
booktitle={2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2024},
organization={IEEE}
}
@misc{chen2024fogros2plr,
title={FogROS2-PLR: Probabilistic Latency-Reliability For Cloud Robotics},
author={Kaiyuan Chen and Nan Tian and Christian Juette and Tianshuang Qiu and Liu Ren and John Kubiatowicz and Ken Goldberg},
year={2024},
eprint={2410.05562},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2410.05562},
}