We wanted to minimize the number of additional parts used, to keep it as simple as possible while also being applicable to something that would be useful in the real world, so we decided that the UR7e arm and camera would be the necessary hardware needed to accomplish this task.
We strapped a stylus to a cube using masking tape to minimize rotation due to friction. We placed the stylus on a pen holder so that the end-effector can grip it vertically to reduce complexity. We also chose to use two 2.5 cm ArUco tags (in addition to the UR7e's base link ArUco tag) to keep our setup as minimalistic as possible. We attached these ArUco tags to the top face of the cube and the top left corner of the iPad, respectively. Since our ArUco tags were small and lay flat on the table, we decided to mount our RealSense camera so that it had a birds-eye view of the table and UR7e.Â
The main variables we experimented with were ArUco tag size, stylus/iPad positions, and camera orientation. Previously, we had it so that the RealSense camera was off to the side (parallel to the desktop) and turned 45 degrees inwards. However, we achieved far more consistent results with a birds-eye setup, presumably because it minimized the distortion of the ArUco tags by being parallel to them. With the initial setup, there was a lot more variance in the UR7e's position for each step, resulting in very low success rates. This was especially true when we used smaller ArUco tags, since distortion has a much more significant effect for smaller tags. We also observed that if the pen and iPad were farther along the left/right edges of the camera's line of sight, the stylus-pickup was more inconsistent, and by keeping the iPad and pen close to the center of the camera, we were able to minimize this inconsistency as well.
Our setup aimed to minimize variance in position by configuring the camera parallel to the ArUco tags, ensuring robustness and consistency. We also utilize MoveIt to perform path planning, ensuring that the UR7e moves along the most efficient path to reach the desired joint states. However, it is not an ideal setup since we do have a human-intervention component where we manually adjust the height of the UR7e to ensure that the robot does not accidentally break the iPad, and it may not be the most practical way to execute real-world repetitive, precise tasks. These problems could be addressed by using a RealSense camera where depth could be observed and measured, which could improve the precision of the motion and make the human-intervention component obsolete.