Conclusions

Discussion of Results and Design Criteria

Overall, the robot played the piano accurately. On most runs, we achieved a 100% accuracy rate for the notes played throughout the entire song. However, on occasional runs, the imperfections of the robot and the way it strikes the piano caused a small fraction of notes to be played inaccurately. For example, the finger sometimes slid from the correct note to the adjacent one.

However, due to the limitations of the Baxter robot, as well as MoveIt's computational delay, the robot played the song very slowly. Although adding waypoints to travel to before playing each note increased the accuracy, it also exacerbated the time delay. Despite this slowness, the delay is not long enough for an observer to mentally lose track of the melody during a performance.

Difficulties Encountered

Homography

We encountered the most difficulty trying to implement our initial implementation design; in fact, we needed to redesign our solution after scrapping it. Our initial plan, as previously discussed, was to use CV to find the pixel coordinates of the centers of each key then map those pixel coordinates to real world coordinates using homography.

To perform the homography, we needed the pixel coordinates of the AR tag centers, for which we used the aruco package, and the real-world poses of those centers, for which we used ar_track_alvar. But aruco was very inconsistent in mapping each pixel coordinate to the correct AR tag ID. In addition, aruco's ability to find the tags' centers varied greatly with the quality of the camera image.

Lastly, the whole process of performing the homography seemed like doing extraneous work. Ar_track_alvar already internally computes a homography matrix, using the camera’s internal parameters and also the physical length of the tags, to project the pose axes using TF onto the camera feed in Rviz.

Flaws and Possible Improvements

One flaw is the slowness of the song being played. For a song with a single melody, a fix we could implement is splitting the notes between the Baxter's arms, so one arm can get in position while the other arm plays a note. This effectively doubles the speed of the song.
Another flaw is the inherent inaccuracy of the AR tag method for computing poses, which uses noisy measurements of the dimensions of the piano. Given more time, we could improve accuracy by fixing the homography discussed above and integrating it with out computer vision algorithm. Alternatively, we could try porting our system to the Sawyer to increase accuracy; however, we couldn't play chords then. Finally, we could explore using the camera intrinsincs to compute the key poses using computer vision without any AR tags.
An interesting extension to this projection is playback, i.e. a human plays a song on the piano, then the robot records the audio signal, processes it to detect which notes are played, and plays them back.
FInally, if we had more time it would have been interesting to migrate our project to the xylophone. This would require building a custom gripper mechanism to strike the xylophone keys clearly.

Report abuse