Here is the hardware that we used for this project:
UR7e Robot
Logitech RealSense Camera
iPad
Along with the aforementioned hardware, we used a weighing scale (pressure testing before moving to iPad), a rubber-tip stylus, three ArUco tags, and a cube.
Here is the transform tree depicting our system. Everything was done with respect to the base_link frame.
Our software relies on AruCo tags to locate the robot arm's base-link, the pen, and the iPad. Planning is done with respect to these detected locations, such that these objects can be arbitrarily placed. The scrolling motion itself + picking up pen motion use hardcoded offsets with respect to the detected locations.
Initially, we did not use a cube, but we found that when the UR7e would scroll, because of the friction, the stylus would slant and cause subsequent scrolling motions to fail. To circumvent this issue, we taped the stylus to a cube and modified our approach so that the robot would grip the cube rather than the stylus. We placed a small ArUco tag on top of the cube for the camera to easily identify it. We placed another ArUco tag on the iPad itself.
At a high level, our scrolling motion boils down to the UR7e moving to precomputed offsets that are based on the ArUco tag positions. We used the MoveIt library for the robot to easily compute and move to a joint state for each step of the scrolling process. As mentioned before, the coordinates are all in the base_link coordinate frame.
The UR7e hovers right above the cube (position computed relative to the ArUco tag mounted on top of the cube), moves down slightly, and grips the cube using its end-effector. It then navigates to a position where it hovers above the iPad (offset computed based on the ArUco tag mounted on the iPad). In early iterations, the position would vary significantly — sometimes it would be too high, and other times it would press very hard on the iPad. To combat this, we added an element of human intervention where we used keyboard controls to raise or lower the end-effector by 2.5 cm. Once we adjusted the end-effector to an ideal starting position, it would then execute the scrolling motions.