Overall, the robot played the piano accurately. On most runs, we achieved a 100% accuracy rate for the notes played throughout the entire song. However, on occasional runs, the imperfections of the robot and the way it strikes the piano caused a small fraction of notes to be played inaccurately. For example, the finger sometimes slid from the correct note to the adjacent one.
However, due to the limitations of the Baxter robot, as well as MoveIt's computational delay, the robot played the song very slowly. Although adding waypoints to travel to before playing each note increased the accuracy, it also exacerbated the time delay. Despite this slowness, the delay is not long enough for an observer to mentally lose track of the melody during a performance.
We encountered the most difficulty trying to implement our initial implementation design; in fact, we needed to redesign our solution after scrapping it. Our initial plan, as previously discussed, was to use CV to find the pixel coordinates of the centers of each key then map those pixel coordinates to real world coordinates using homography.
To perform the homography, we needed the pixel coordinates of the AR tag centers, for which we used the aruco package, and the real-world poses of those centers, for which we used ar_track_alvar. But aruco was very inconsistent in mapping each pixel coordinate to the correct AR tag ID. In addition, aruco's ability to find the tags' centers varied greatly with the quality of the camera image.
Lastly, the whole process of performing the homography seemed like doing extraneous work. Ar_track_alvar already internally computes a homography matrix, using the camera’s internal parameters and also the physical length of the tags, to project the pose axes using TF onto the camera feed in Rviz.