Technical Details


We present the user study results comparing the baseline method with MR-based approaches, namely MR-SC and MR-PT. To assess user performance, we employ a pick-and-place task as our benchmark. The objective of the task is to assess the participant's task performance, experience and workload to interact with the proposed teleoperation system and accurately place each cube in the corresponding bin.  More specifically, participants were asked to teleoperate a robotic arm to pick up and then place three colored cubes (blue, red, and green) into corresponding colored bins.


The proposed Mixed Reality (MR)-based approach aims to offer a deeply immersive teleoperation experience, employing a leader-follower paradigm via digital twin technology and an immersive scene. For comparative studies, we employ a motion-capture device (Geomagic Touch)-based remote control as the baseline method, with visual feedback presented on a 2D screen. 

MR-SC embodies immersive teleoperation through a virtual screen. Users access 2D visual feedback captured from the robot's operational environment, with images displayed on a head-mounted display. A digital twin of the robotic arm is crafted to enable a leader-follower control mode. Here, operators manipulate the virtual robotic arm to generate control commands. Concurrently, the states of this virtual arm (leader robot) are synchronized with its physical counterpart (follower robot). Both the Baseline and MR-SC methods possess a restricted field-of-view (FOV) due to the deployment of an RGB camera. Notably, both the flat 2D screen and the virtual screen project 2D content, which compromises depth perception.

MR-PT method, facilitated by the colored passthrough function of Oculus Pro, offers a holistic immersive experience. This allows users to engage with the full expanse of the screen. MR-PT eradicates previous constraints, as operators can interact with fully immersive 3D content. This amplifies their grasp of geometric relationships in the real world, fostering a more intuitive sense of presence.



 With full support for Meta Quest development via Oculus Integration, Unity stands out as the platform for immersive software creation [1]. The Oculus Integration package equips developers with an SDK tailored for controller interaction, hand tracking, passthrough function, and a specialized interaction SDK for efficient deployment.  

Communication between Unity and the ROS network is facilitated by the ROS TCP Connector plugin (within Unity) and the ROS TCP Endpoint node (in Python). More specifically, the ROS TCP Connector manages the publishing and subscription of ROS messages between Unity and the ROS network [2]. On the robotic side, the ROS TCP Endpoint operates as a ROS Python node, orchestrating the publishing and subscription of ROS messages externally from Unity [3]. A joint position controller and gripper controller are written as Python ROS nodes. The node parses data from the ROS message to Python and then executes the desired joint setpoint using pymycobot API through serial communication to the hardware  [4]. The RealSense camera is connected to the Raspberry Pi and published images to the ROS network.

The Unified Robot Description Format (URDF) function is used as an XML-centric descriptor for a robot's rigid-body profile. It encapsulates details concerning the robot's kinematics, dynamics, visual elements, and collision models. To integrate the robotic arms' URDF profiles into Unity, we utilize the URDF Importer Plugin, allowing them to be represented as PhyX 4.0 articulation bodies. 

Once the robotic arm is integrated into Unity, the Final IK Plugin (Unity Package) is employed as Unity's primary inverse kinematics (IK) solver [5]. It calculates Inverse Kinematics, determining joint positions based on the end-effector state. The Interaction SDK, integrated with Oculus devices, empowers operators to grasp the end-effector within the immersive scene.  

 CCDIK, one of the plugin's IK solvers, computes joint states based on the provided end-effector state. To delineate the desired end-effector state, a grabber is positioned at the end-effector's center, termed the target within the realm of Final IK. The MyCobot 320 is characterized as a six-axis robot comprising six serial links. Links assigned with the `RotationHinge' attribute are recognized as the bones parameter within CCDIK's settings. To ensure the joints operate within a safe and feasible parameter, the Use Rotation Limits option is activated. However, a notable limitation of the Final IK Plugin is its output form as `Transform' [6], rather than the more intricate `ArticulationBody' . The latter offers a detailed description of joint positions, mirroring the definitions found in ROS.


[1] url = {https://developer.oculus.com/documentation/unity/unity-gs-overview/}

[2] url = {https://github.com/Unity-Technologies/ROS-TCP-Connector}

[3] url = {https://github.com/Unity-Technologies/ROS-TCP-Endpoint},

[4] url = {https://github.com/elephantrobotics/pymycobot}

[5] url = {https://assetstore.unity.com/packages/tools/animation/final-ik-14290},

[6] url = {https://docs.unity3d.com/ScriptReference/Transform.html}