A Generalized Co-Location Framework for Mixed Reality Robotic Agents

Nick Rewkowski, Advisor: Prof. Ming Lin

The sources for images that I borrow will be embedded in the image (so clicking the image will bring you to the source). I use YouTube videos uploaded by others so it is trivial to find the source.

Current Paper Preprint

mixed_agents (14).pdf


  • navigation

  • human-robot interaction (HRI)

  • augmented reality (AR)/virtual reality (VR)

  • game engines

  • navigation meshes

  • social robotics

  • tracking

  • decision trees (but may not be described here)

  • marginally related to model-predictive control (MPC) and other decisional constraint methods

  • somewhat related to simultaneous localization and mapping (SLAM), although my method specifically tries to avoid SLAM by using external trackers

  • collision avoidance (e.g. reciprocal velocity obstacles (RVO))


While the field of human-robot interaction (HRI) has traditionally focused on how humans and robots interact in the real world, fields like virtual reality (VR) and augmented reality (AR) have integrated robots with virtual environments (VEs) as well for applications involving surgery, haptics, locomotion, etc. However, since robots are still physical beings, they must adhere to both their physical constraints and the constraints of the VE itself, such as virtual obstacles, the behavior of the user avatar, distortions in the virtual environment, and so on. This problem can be exacerbated when there are slight differences in the behavior between the physical robot and its own virtual "avatar," e.g. if the user sees a version of it that behaves more smoothly than the real robot or follow a distorted trajectory under some perceptual threshold.

This page will provide an overview of some of the concepts needed to force a robot to adhere to both physical and virtual constraints. It will focus more on simpler movable robots such as 4-wheeled differential drive-based robots and their constraints, but most, if not all of the concepts will have some analogy in the scope of a different type of robotics. For example, a 4-wheeled robot that is tracked in the physical and virtual world must adhere to both physical bounds (the limitation of the tracked space) and virtual obstacles, as described by a model such as a navigation mesh. This mesh is typically a graph represented as a 3D mesh with edges and polygons, and we can use an algorithm like A* to quickly find an optimal path from the current position to a waypoint. While navigation meshes are traditionally considered a constraint specific to translational motion, similar models are required for other types of robots, e.g. an articulated arm robot would need a voxelized equivalent to a navmesh to provide constraints in 3-space, and drones would need something similar.

Relation to Decision-Making

Understanding the spatial constraints that the robot faces is as important as understanding mechanical constraints in decision-making, since these are required for the robot to understand where it can or cannot go with respect to its virtual AND physical environments. In the case of VR, where we deal with virtual AND physical constraints which may require some alignment (e.g. in the case of a passively haptic wall or another agent in the physical space), we must better understand the relationship between these constraints to avoid breaking the user's immersion and running into impossible configurations such as the robot getting stuck inside of a virtual wall due to the user teleporting. Additionally, with multi-agent systems in which agents may be virtual and/or physical, each agent must understand how to respond to the other agents, such as avoiding them. In general, the human user themselves can be one such agent.

In some cases that will come up later in the page, we may want to have robot behaviors that are specifically meant to try to align its physical and virtual embodiment/avatar, such as when continuous distortion causes it to slide in the VE unnaturally. We can also take advantage of the virtual and physical embodiments not necessarily being aligned to smooth out the visual appearance of the robot, possibly convincing the user that it is moving more naturally than it really is. There are also other perceptual phenomena that can be taken advantage of, such as the inaccuracy of human trajectory or inaccurate ability to feel certain types of haptics, allowing the robot to make decisions that increase the user's presence. We must also be careful to ensure that the robot's motion in response to physical stimuli that the user cannot see, such as physical objects, is not immersion-breaking.

Formalizing the Problem

Defining Basic Terms

Virtual Environments (VEs)

VEs, aka virtual worlds, are 3D spaces which are comprised of 3D models/meshes that can be navigated by some user--in most cases, a human user operating a virtual avatar that they control and move around the space. However, other physical actors may also persist in a VE, such as avatars for physical robots or other physical items that are tracked and recreated in the VE somehow (e.g. a static mesh representing a physical wall). (pic)

Virtual Reality (VR)

VR is a field dedicated to replacing the real sensory stimuli that humans experience with synthetic stimuli . For example, instead of looking at a 3D physical cup, the user looks at a 3D virtual model of a cup instead. Each sense that is replaced then requires a display to portray the replaced sense. In the above example of a cup, this would require a visual display to show the virtual cup, such as a computer monitor or screen. If we are replacing real sound with virtual sound, we then need an auditory display such as headphones.

Nowadays, we mostly care about the specific case of stereoscopic VR, in which a user wears a head-mounted display (HMD) that contains 2 visual displays, 1 per eye, onto which the view of the VE that the eye would see is shown. By having 1 view per eye, we can recreate the stereoscopic cues allowing us to perceive the depth of a 3D object. These HMDs can also include headphones and other accessories.

Popular consumer stereoscopic VR devices nowadays include: HTC Vive, Oculus Rift, Oculus Quest, Windows Mixed Reality (WMR) headsets, Playstation VR, and Valve Index. (pic)

In this project, we focus specifically on VR, since VR provides full VEs that replace the real world, so it makes more sense to distinguish virtual and physical constraints than it does in AR (next item).

Augmented Reality (AR)

AR is similar to VR except the synthetic stimuli do not completely replace the real stimuli ; they instead supplement real stimuli and allow both types of stimuli to be displayed simultaneously.

Pure AR does so with no knowledge of the real world, e.g. a user interface displaying a translucent web browser. Mixed reality (MR) uses information from the real/physical environment (PE) to drive the interaction with synthetic stimuli, e.g. placing a virtual 3D object onto a real table.

Like in VR, the more popular devices are stereoscopic MR, but stereo MR is still not quite at the consumer stage. Popular stereo AR devices include the Microsoft HoloLens and MagicLeap One. An example of a non-stereo MR device would be a smartphone, in particular, something like the AR filters found in apps like Snapchat.

Physical Constraints

Physical constraints include any object in the PE that is capable of affecting the navigation of a movable physical agent (PA) such as a robot or human user. In the case of VR HRI, these will usually be the size of the tracked VR space and physical agents themselves. However, some applications can also add obstacles like furniture, walls, or other passive haptic stimuli.

Additionally, physical constraints include constraints on the actual motion of the PA. For example, a wheeled robot has limitations on how fast it can move and rotate, and articulated arm robots have a limited number of degrees of freedom. A human user has similar limitations due to biomechanics.

Virtual Constraints

Virtual constraints are similar to physical constraints, except they apply to objects in the VE. These can include virtual obstacles, virtual agents (VAs) roaming the environment, and so on. Like the physical constraints, there are also limitations on how fast the avatars of the robot and human can move, but typically, the virtual and physical constraints of these simultaneously virtual and physical actors will be 1:1.

Path Search and Traversal Algorithms

In order to decide the path that a robot is allowed to travel when it is faced with virtual or physical obstacles, a path search algorithm is required to generate a valid path. Typically, we use a fast shortest-path algorithm such as A* (Hart 1968), which uses a graph of nodes to find the set of edges, or path segments, that the robot should traverse to reach its goal. If the goal is unreachable, we can still receive the set of edges required to move the agent as close as possible.

Navigation Meshes (Navmeshes)

In 3D graphics, navmeshes are the popular solution to generating the graphs needed for path search algorithms to generate valid paths. They are 3D meshes describing the valid traversable space in the VE and are generated using the 3D geometry of the VE as well as a set of parameters describing factors such as: the size of the agent (height and radius of a capsule approximating an agent), step height, jump height, obstacles, etc. With modern algorithms such as those embedded in game engines, they can be generated very quickly even if they need to be generated every frame due to dynamic obstacles.

Navmeshes are typically created as 2D structures, meaning that they assume that the agent will be standing on the ground and cannot fly around. Thus, any kind of height offset from the navmesh results in the position of the agent being projected onto the navmesh.

Voxelized Navmeshes and Heightfields

Voxels are cuboids that a 3D scene can be broken up into. They are often seen as the 3D equivalent of a pixel.

Voxelized navmeshes are the 3D equivalent of a navmesh and are required to handle non-2D navigation tasks, such as a flying object (plane, bird, etc.) or a robot that does not rest on the ground like a drone or articulated arm. The generation process is not significantly different from the 2D version, except a structure called a heightmap is typically used to describe the obstacles at a certain column of voxels.

Obstacle Avoidance & Proxemics

While navmeshes are a good solution for static obstacles such as walls that never move during a simulation, handling movable virtual or physical agents requires different methods. Naively, one could simply recompute the navmesh every timestep of the simulation and use all agents are obstacles for the navmesh to factor into the calculation. However, the more popular solution is an algorithm called reciprocal velocity obstacles (RVO) (Van den Berg 2008), which defines a submissiveness factor for each agent describing how likely they are to move out of the way of another agent and uses this to calculate a per-frame vector describing where an agent should shift its trajectory to avoid another agent. This method is built into popular game engines like Unreal Engine 4 (UE4) and Unity.

The method is based on concepts from proxemics, which is a field dedicated to understanding the comfortable personal space of agents such as humans and robots (e.g. Mumm 2011 below).

Differential Drive

Differential drive is a set of robotics methods describing the movement of a wheeled vehicle. In the case where the wheels on a given side both move with the same speed, we use the unicycle model to describe a wheeled vehicle, which converts from parameters describing the desired rotation and translation of a robot to parameters describing the wheel inputs required to make the robot move towards those transformations.

For the purpose of this report, it is unnecessary to describe the actual equations. This is here because wheeled robots are one of the most basic use cases for complex navigation through VEs and PEs.

Haptic Proxy

In the field of haptics, a recent trend is to use physical objects called haptic proxies to allow a user to "feel" a virtual object by placing a passively haptic proxy in the location of the virtual object. Recent robotics methods allow for these proxies to be generated dynamically by having robots move towards where the virtual object would be located in the physical world based on tracking. This will be relevant to many of the use cases described later.

Trajectory Distortion

Trajectory distortion refers to the mismatch between an agent's physical and virtual trajectory. This is often seen in locomotion methods such as walking-in-place (WIP) (Usoh 1999), motion compression (MC) (Nitzsche 2002), redirected walking (RDW) (Razzaque 2001), and translational gain (Betsy 2006). These are typically meant to allow the user to naturally walk throughout the entire VE instead of resorting to simpler methods like joystick motion or teleporting. As will be discussed, these methods pose a challenge for physical robots in VEs.

However, these methods do allow for both a continuous physical AND virtual trajectory, which may allow for other optimizations to be done considering that this means that the virtual and physical constraints are also continuous relative to the robot. This concept may make more sense after the next item (or after seeing examples).

Trajectory Discontinuity

Trajectory discontinuity refers to a mismatch between the VE and PE that results in a discontinuous virtual trajectory. The biggest offender that can cause this is the popular method of teleporting (or the special case of dashing), which is simple, ease to use, fast, and not particularly sickening (see Langbehn 2018 and Coomer 2018). However, this means that it is possible for the constraints of the robot to suddenly change to a completely different configuration, e.g. if the user teleports very close to a virtual wall, possibly causing a configuration that breaks the navigation, e.g. if the resulting teleport causes the robot to teleport into the wall. Another offender is snap rotation (hitting a button that causes the virtual avatar to rotate some specific number of degrees with their head as the pivot) (Farmani 2020).

Defining Different Navigational Challenges and Variations of the Problem

In all definitions below, we assume that the robot is tracked in the same physical space as the user and that the robot and user's definitions of the physical space are calibrated (e.g. with VR equipment such as the ViveTracker). We also assume that the robot has both a physical and virtual embodiment, meaning that its tracked transformation is used in both the PE and VE, perhaps with a virtual avatar.

We also use "dynamic" interchangeably with "movable," although generally, dynamic does not necessarily refer to movable (e.g. soft bodies). Thus, we are assuming that all agents in the scene are computed as nondeformable rigidbodies.

Physical Agent Distortion Problem

A major issue to be handled in the case of PE/VE distortion is described as follows:

  • Distortion methods such as RDW function by rotating the VE AROUND the user's head under some threshold, usually as a function of the user's head velocity

  • Because the user's head is the pivot, and both the robot and user are tracked in the same physical space, this rotation will result in the user not translating at all, while the robot WILL translate. The result will be that the robot agent will appear to slide in the VE, especially if it is not moving in the PE. This is based on basic transformations in graphics.

In the example to the left, the triangle is rotating with the point that the colored lines converge being the pivot of the rotation. Due to this transformation, the triangle not only rotates by 45 degrees, but it has also translated in the space.

This seems to be a fundamental problem with using distortion methods and other physically tracked objects (e.g. robots) at the same time. This does not come up in literature, as our own work in GAMMA on using RDW with a robot is seemingly the first work that encounters this problem in the context of VR. Our work will be described later. Our current solution is to simply update the robot's target navmesh-constrained position every timestep, even if it is supposed to be staying still, because there does not yet seem to be a trivial solution to this problem. This will be described more soon.

Physical-Only Obstacles

The case of physical-only obstacles does not seem to have a particularly strong use case in VR; it would not make much sense to have the robot navigate around a physical entity that the user cannot see as it does not seem to have any use cases and would break the user's sense of presence. In some fields such as virtual locomotion, this is more relevant, as you can optimize the physical walkable area for the user to navigate by guiding them around physical obstacles. Even this use case is very niche. In the robotics case, this would case the robot to suddenly change trajectory in response to some physical object that the user is not aware exists.

However, for the purpose of this section, let us entertain the concept.

This section will not go into detail on how the robot and user are tracked, as there are many common ways of doing so and we can assume that we do have some way of tracking them if they are being used in the simulation (otherwise, none of the relevant methods would make much sense).


A static physical-only obstacle may be some piece of furniture in the physical space that the user may or may not realize is there, but cannot move before the simulation for whatever reason. The use case in the non-robotics case may be guiding users around furniture that they do not track or place in the VE. One method that addresses this is motion compression (Nitzsche 2002), which precomputes some mapping between the VE and smaller PE, factoring in obstacles and other constraints, and causes the user or agent's trajectory to be distorted as they traverse the remapped virtual environment--almost as if the VE were crammed inside the PE. This same method would likely work for robotics, but as mentioned above, it is not clear why one would want to. The example from Dong et. al below is a more recent version of the method.

Another method addressing this is redirected walking (RDW), which distorts the agent's trajectory in realtime using information about their previous motions and their current waypoint, with the goal of keeping the waypoint INSIDE or on the OPPOSITE side of the PE from the user as much as possible, since this would provide them the maximum amount of undisturbed walking space. The Kohli and Matsumoto examples below are not exactly relevant to this section as the physical obstacle is represented in the VE, but there is a lack of more relevant physical-only obstacle examples.


A dynamic physical-only obstacle may be another user or robot that is navigating the same physical space. While these methods have existed for many years to some extent, they are usually meant purely as research questions rather than real applications, as it is not clear why anyone would use a VR system knowing that there is some other VR user in the same space that they cannot see who cannot see or avoid them either. The Matsumoto example above is relevant as users can travel the same physical corridor, and the below examples show similar results.

Virtual-Only Obstacles: No PE/VE Distortion or Discontinuity

Virtual-only obstacles in which there is no trajectory distortion are arguably the easiest navigation cases as they involve no tracking and it is very simple to compute navmeshes in virtual-only contexts such as the 3D VE in a game engine. There is also no alignment with a physical embodiment that needs to be handled.


A static virtual obstacle is simply any 3D non-moving object in the VE that the agents can bump into. They are typically avoided by precomputing a navmesh and marking these as obstacles that the navigation system should navigate paths around. Since we are assuming the robot agent has both a physical and virtual representation, we can easily handle static virtual obstacles by simply grabbing the best path from the navmesh and navigation system using the robot's VE position, and computing how the robot should move based on the difference between its current VE position and waypoint position. Since there is no distortion now (all movements are 1:1), we are simply at the mercy of the robot movement model, such as differential drive.


A dynamic virtual obstacle would be something like a VA that is roaming the scene, such as an AI character (aka bot) moving between virtual waypoints. In this case, if the robot is moving from point A to point B along a path given by the navigation system, and they encounter a dynamic obstacle, if the dynamic obstacle is something autonomous like a VA, we can have it move out of the way of the robot, or we can use RVO to find the vector direction that the robot should offset itself from the ideal navigational path and apply that to the robot's movement calculations until it successfully navigates around the VA. The robot's friendliness is simply provided by the programmer.

If the dynamic obstacle is NOT autonomous, e.g. a moving wall in the VE, then we can recompute the ideal path from A to B on the navmesh every timestep and make sure that the robot's movement algorithm is constantly recomputing its movements. For example, in the case of differential drive, we would want to update the ideal navigable path every frame of the simulation, and use the difference from the robot's current transform to the transform of the next point on the path to compute the wheel velocities it should use to reach that point.

Virtual-Only Obstacles: PE/VE Distortion

Fortunately, due to the continuous nature of these distortion methods, the solutions to static and dynamic obstacles are the same as above, except we must absolutely recompute the robot's ideal path and cause the robot to recompute its movement parameters every timestep, even if it is meant to stay still. This will cause the robot to make many small adjustments if it is not meant to be moving in the VE, which may break the user's immersion if the robot is meant to resemble some natural object (like an animal). This can still fail if the distortion occurs faster than the robot can move.

Virtual-Only Obstacles: PE/VE Discontinuity

In the case of discontinuities, the most suitable solution would seemingly be to prevent the user from teleporting if it would result in the robot getting stuck in a virtual object. One could also test for other factors, such as: if the robot was visible to the user in the VE before the teleport, then it should also be visible afterwards. Most teleport methods already prevent the user from teleporting if it would cause themselves to get stuck (or if it would cause a virtual wall to intersect their PE), so this would probably not be difficult to implement. It could, however, make it much more difficult to teleport in general due to the additional constraints, especially in a cramped VE with many obstacles.

This would require that the space into which the user wants to teleport is as wide as the distance between the robot and user at the time they want to teleport. Perhaps one could program the robot to come to the user when the user wants to teleport to try to remove this inconvenience.

Virtual/Physical Obstacles: No PE/VE Distortion or Discontinuity

This is the same case as most passive haptics work, where a physical object is used to provide haptics to a virtual one. There are many implementations of this, such as the Kohli example above and the well-known Meehan 2002 paper on using haptics to increase immersion in VR. In the case of no distortion/discontinuity, there is no difference from virtual-only obstacles, but we would probably want to keep more distance between the robot and obstacles to avoid damage.

Virtual/Physical Obstacles: PE/VE Distortion

As in the virtual-only case, we would simply need to recompute the movement parameters of any robotics physical/virtual agent every timestep lest they be distorted into an obstacle. Again, the limitation is that they must almost always be moving due to continuous distortion.


If the other physical/virtual agents are static, then we must recompute the navmesh every frame as the object slowly shifts in the VE, and then recompute the robot movement parameters.


If the other physical/virtual agents are autonomous and movable (such as the user themselves), then we should compute the RVO trajectory shift vectors every frame.

Virtual/Physical Obstacles: PE/VE Discontinuity

It is not quite clear why this case would happen or why someone would want this feature; there does not seem to be relevant prior work. Any kind of teleporting would cause all agents with physical embodiments to teleport within the VE since they are all part of the same PE... so it is unclear what kinds of applications would appear for this. In any case, the solution would likely be the same as the virtual-only case in which we simply prevent the user from teleporting or snap rotating if it causes other physical/virtual agents to be stuck in a virtual obstacle or to not have the same visibility as before the teleport.

The only seemingly relevant example, which does not involve robotics, might be Beck 2013, in which local users in the same PE fly around a virtual city.

Current Applications and Results

Nick Rewkowski and Prof. Ming Lin

This project is on using a robot dog that is a natural component of the dog-walking simulation to guide the user away from the PE boundaries with haptics and an optimized RDW method that takes advantage of the expected robot trajectory. The robot and human are physical/virtual agents, there is distortion due to RDW and translational gain, and there are many static and dynamic virtual obstacles.

Since the dog is almost always in motion, the physical agent distortion problem (called the "physical distractor distortion problem (PDDP)" in this project since the robot is a distractor) is not as much of a problem, although it can be a bit unnatural in edge cases where the user is not moving at all but keeps rotating their head.

The robot also has a behavior graph describing how the robot should avoid getting too close to the user and risk getting kicked.

The project is not yet at the stage where we can evaluate the effect of the haptic tethering between robot and user due to COVID quarantine.

This project uses a mobile wheeled robot containing a grid of movable blocks to guide haptic touch of virtual objects. The entire robot translates along the ground plane to where the virtual object is located, and then moves the block array to resemble the shape of the virtual object being touched. This project, like the robot dog project, has both the robot and human be virtual/physical agents (in particular, the user's hands), but there is no distortion or discontinuity in this case, allowing the robot to not worry about the VE constraints suddenly changing.

The daVinci robot is an admittance haptic device providing the surgeon a VR view of the operation area and allowing them to control the robot's operation of the procedure instead of doing it directly. This robot must be aware of the dynamic softbody constraints of the operation area, as well as respond to the user's movement of the device. Since the mechanics of admittance haptics are quite complicated, it is unnecessary to go into detail here. The main limitation of these robots is that they cost almost $2 million apiece.

This project has various robots, such as wheeled robots and quadcopter drones, respond to different virtual stimuli. For example, the drones avoid VAs roaming the environment, and the wheeled robots push a virtual physics-based object, with the robots responding accurately to the haptic forces as if they were pushing a physical object. These robots must be aware of each other and the virtual constraints, although this project does not appear to apply any distortion to the robots' trajectories.

This project uses stepping stone robots to handle the height difference between different parts of a virtual floor. The robots queue up so that the user can walk continuously, so this involves swarm robots, physical constraints (the other robots in the swarm and the user), and virtual constraints (the virtual floor and its height at various points). This could probably work with various types of distortion, such as translational gain, and maybe even with discontinuities assuming the user does not walk WHILE teleporting (because the robot in front of them would not be able to respond quickly enough to the height change).

A limitation is probably that it may not be easy to turn back around or make a sharp right turn.

This is a literature review that does a good job of describing previous comparisons to virtual agents vs. physical representations of social robots in different contexts. It finds that people generally prefer to respond to a real robot, with or without a virtual avatar, over responding to a purely virtual agent in the various contexts that they test.

However, these examples are not particularly relevant to VR or MR since the users respond to a screen and there is not much of a VE or complex simulation mechanics--the user does not do much besides respond to the robot. Thus, it does not really sell the merits of an AR/VR-based robotics system well.

This paper is still useful for terminology and related work, and it derives many of its terms from the Hoffmann paper below which describes how virtual avatars for robots should behave.

Haptic Proxies

There is much interesting recent work in the area of haptic proxies and redirected touching, which often use robots to reposition themselves to act as the haptic sensation of a virtual object.

One of these projects has tiny robots build themselves into a shape resembling the 3D virtual object, such as a gun or toy.

Another has robots move into place so they can be grabbed and repositioned for something like a chess game between VR users. A similar project is shown in which robots move furniture around so that the user is always feeling a piece of virtual furniture, even if they teleport.

Some more serious applications of this are in industry, where we may want to train the humans and robots to interact with each other.

There are also others on redirected touching where a haptic interface/robot dynamically responds to what the user is trying to touch with the appropriate haptic response, even if the haptic interface's shape does not allow 1:1 feeling. Prior work in this area shows that this does not matter much perceptually (Kohli 2010).

Since there are many of these projects, but they are more tangentially related to the problem of virtual-physical constraints (these are mostly concerned with virtual constraints, with the physical constraints usually being other robots in the swarm), I provide the videos for the reader to explore more.

Open Questions

  • At the moment, there is not a clear solution to the physical agent distortion problem besides forcing the robot to constantly move. One could provide parameters that give it a certain amount of distortion before it needs to readjust itself, but this may result in some unnatural behavior. In the case of wheeled robots, if they move slowly enough, a user may be tricked into not realizing that the sliding is happening.

  • On this page, the assumption is that the robot can be easily tracked with VR equipment that is calibrated in the same space as the HMD itself. However, this is not the case for certain cases such as AR/MR, in which the world origin is not consistent, there are no alternatives to VR devices like ViveTrackers that track arbitrary objects like robots, and the HMD may not always have a clear view of the robot. This would require more complex tracking methods that ensure that the robot can be calibrated well with the HMD for the most accurate interaction, such as SLAM+Kalman filtering, as well as something allowing the HMD to recognize the robot (e.g. retroreflective markers).

  • Discontinuities such as teleporting and snap turning can completely break the navigational ability of the robot, and distortion methods may cause the VE to distort more quickly than the robot can move.

  • There is not much work in the above applications on evaluating the naturalness of the HRI. We see many technical methods, but it is not clear how the presence of the user is affected by the robot motion specifically. For example, some papers note that the robot movement sounds can be distracting or that the lack of smoothness in the robot motion is noticeable, but we do not see enough statistics proving that this matters.