Initially, our project was not working very well due to high variability. The robot's motion can be decomposed into several substeps/end-effector joint states. Those joint states would vary quite a bit, resulting in low success rates. Even getting a single successful scroll was highly improbable. However, as we calibrated our offsets for each of those positions and modified our setup, we were able to get more and more consistent results, leading to a much higher success rate. In the end, we managed to make it scroll 5 times fairly consistently and successfully demonstrated it live.
Our project performed the following tasks in order:
Detect ArUco tags corresponding to the Ur7e robot, the stylus, and the iPad
Picking up a stylus from a pen holder based on the stylus's ArUco tag position
Navigate to the iPad's ArUco tag and scroll social media platforms using that stylus
Here is a video showcasing our setup and our project in action.