Overall, our team was very pleased with our results, as we were successfully able to scroll on an iPad without breaking the screen. Admittedly, we assumed that the robot would have much better precision with the human-adjustment feature, so we would have preferred that we could replicate the same success rate without using that feature. Nonetheless, we were able to accomplish our initial task of picking up a stylus without knocking anything over, as well as scroll on an iPad screen for multiple scrolls through posts.
As mentioned in other sections, the main difficulty we faced was mitigating variance. In our case, the variance lay in the end-effector positions at each step of the process. We used camera orientations that would minimize the distortion of the ArUco tags, and kept the pen and iPad close to the center of the camera to further minimize this.
In our sequence of steps, we made it so that the end-effector would first hover above the iPad and go down before starting the scrolling motion. This was highly variable, even after we adjusted our setup, and to mitigate the risk of the UR7e breaking our iPad, we decided to incorporate human intervention. It essentially works like the Turtlebot teleop from lab — we set up keybinds to raise and lower the height of the end-effector by 2.5 cm. This is a slight hack since it is a manual fix. With our current setup, if we had more time, we would experiment with the camera positions and ArUco tag orientations to see if we can get it so that the end-effector positions are naturally more consistent at each step of the process, plus opt for a RealSense camera to sense depth and increase precision.