RILaaS: Robot Inference and Learning as a Service

Ajay Tanwani, Raghav Anand, Joseph Gonzalez, Ken Goldberg

[paper] [video presentation] [bibtex] [sign-up]


Programming robots is complicated due to the lack of `plug-and-play' modules for skill acquisition. Virtualizing deployment of deep learning models can facilitate large-scale use/re-use of off-the-shelf functional behaviors. Deploying deep learning models on robots entails real-time, accurate and reliable inference service under varying query load. This paper introduces a novel Robot-Inference-and-Learning-as-a-Service (RILaaS) platform for low-latency and secure inference serving of deep models that can be deployed on robots. Unique features of RILaaS include: 1) low-latency and reliable serving with gRPC under dynamic loads by distributing queries over multiple servers on Edge and Cloud, 2) SSH based authentication coupled with SSL/TLS based encryption for security and privacy of the data, and 3) front-end REST API for sharing, monitoring and visualizing performance metrics of the available models. We report experiments to evaluate the RILaaS platform under varying loads of batch size, number of robots, and various model placement hosts on Cloud, Edge, and Fog for providing benchmark applications of object recognition and grasp planning as a service. We address the complexity of load balancing with a reinforcement learning algorithm that optimizes simulated profiles of networked robots; outperforming several baselines including round robin, least connections, and least model time with 68.30 % and 14.04 % decrease in round-trip latency time across models compared to the worst and the next best baseline respectively.

Keywords: Cloud and Fog Robotics, Networked Robots, Transfer Learning, Federated Learning, Distributed Systems

RILaaS Architecture

RILaaS uses a hierarchy of resources in the Cloud-Edge continuum to distribute inference/prediction serving of deep learning models such as grasp planning and object recognition on a fleet of robots. Users can manage robots and models with a front-end API that interacts with the inference loop through a metrics server, authorization cache, and a Docker model repository.

Inference optimization with adaptive load-balancing: A Q-Learning algorithm adapts the distribution of the incoming requests from the robots between the Cloud and the Edge resources to optimize the round-trip latency time.

Vision-based decluttering application where the robots send the RGBD image of the environment to the inference service and retrieves the object categories and bounding boxes, along with their grasp locations to put the objects in their corresponding bins.

Deploy deep models on robots with two lines of code !

from client import RobotClient

rc = RobotClient(







outputs = rc.predict(inputs)

Contact Us

For further questions about deploying deep models on your robots with the RILaaS platform, please reach out at: ajay.tanwani at