The project's design must address the fundamental user need of experiencing robot execution in a simulation space as reminiscent of the physical world as possible. It must run on the kinds of devices that most users would have at home, as they do not have access to their labs. Multiple users must be able to work in the same environment, with as low latency as possible.
In other words, our criteria can be enumerated as follows:
Fidelity to in-person experience
Accessibility / ease of use
Ease of collaboration
To put these criteria in terms of concrete achievable goals, we used the following tier system to judge our eventual results by:
“Hello”: A server was set up which can receive basic commands from multiple users through the command line. Users can control at least one robot and collaboratively develop.
“Hello world”: There is a friendly, not overly clunky UI which enables users to control at least two robots. The AR environment should work and allow multiple robots to interact with one another, but this is mainly to test the UI and make sure that our sensors work and the robots are able to do simple things, like motion. A success for “Hello World” is being able to control robot motion using a UI, and receive real time information through our sensors.
“Hello universe”: A beautiful piece of software was developed to allow users to control 3+ robots from around the world with minimal lag and observe the robot responses in real time through AR. This is where we will integrate tasks in the environment over the network, and implement the complex tasks. A success for “Hello Universe” means that robots are able to adequately satisfy the demonstration tasks with minimal latency.
"Hello multiverse”: Our reach goals include porting our design to hardware. This would be one or more robots in the same or multiple locations that can be viewed by users anywhere in an AR, and controlled using the interface developed in “Hello universe.” A success for “Hello Multiverse” means that we extend the functionality of “Hello Universe” to hardware.
Our team decided to adopt augmented reality (AR) as the medium for the simulated environment. We did so because while virtual reality might be more immersive, it often requires specialized, expensive headgear that users might not own. What's more, a robust and rapidly growing development community exists around AR. This community has also AR engines for most widely used smartphones, allowing the system to run efficiently on common household devices.
We chose to use a central server running a ROS master node as the backbone for our network. AWS, with its reliable uptime and accessibility, low latency, and scalable compute instances, was an obvious choice for server hosting though we did initially consider using Microsoft's Azure service as an alternative.
We chose Unity for our simulated environment because it provides a well-documented, powerful AR library (ARFoundation) and the ability to simulate robots by importing URDFs. Gazebo is typically the program of choice for robotics simulation (at least, it appears so from the course material and some brief research) but doesn't appear to have much AR support.
Since three of our four team members have Android devices, we decided to use ARCore (which is Unity’s Android AR development library) as the primary visuals development platform. An iOS analog to ARCore exists, and given that newer iPhones ship with Lidar sensors on board, this could be an avenue for further work. Such expansion was out of scope for the purposes of this proof of concept, as none of us own this version of the iPhone.
We also opted to start our development by testing with the TurtleBot as it was a robot we were already familiar with, is widely used (or at least known in industry), and is mobile which would fully utilize an AR space as opposed to a stationary bot like the Baxter robot.
We made accessibility a priority by choosing to run our system in AR on smartphones. Rather than imposing hardware or complexity expectations on users, we decided to make the system as compliant to the resource constraints of the average developer as possible. This had the main disadvantage of introducing lots of moving parts and disconnected systems — the visual display engines (AR on smartphones) were isolated from the simulation environment engine (Unity) which was disconnected from the core robotics engine (ROS). Much of the project therefore involved finding ways to link these disparate building blocks, which took far longer than expected. By virtue of choosing this new AR development space, we had much less documentation to work with online, and therefore had to critically understand each aspect of what we wanted and work from mostly scratch.
In addition, the sensing power of our design was limited because the same camera used to detail the AR environment was also responsible for feature extraction and refreshing information to the AR space. This puts a lot of burden on a single sensor. We learned that the newer iPhone models have Lidar cameras available, and Unity supports this design, however none of us had access to that smartphone and couldn't test it out.
In summary, we made the following design decisions:
Using Unity over other simulation platforms, such as gazebo or v-rep
Using ROS-TCP-Connector to communicate between ROS and Unity
Sacrificing sensing complexity for accessibility
In a real world engineering application, this emphasis on accessibility makes the project easier to adopt for researchers and developers. It also presents some challenges — each of the links from one engine to another is a potential bottleneck and a potential “weak link” where scalability might suffer. However, as an initial prototype of a solution to a pertinent problem, our project is likely sufficient for basic use cases.