Our first approach at finding the poses of the piano keys was to:
To compute the homography matrix:
The goal of the computer vision algorithm is to process the image below and compute the pixel positions for each key on the piano.
2. Pre-processing: We transform the RGB image to grayscale then to black and white. Then we invert black and white so the normally black keys are white. This makes them "features" in the image.
3. Labeling features: We use the scipy label function to segment each feature (black key) in the image. One issue that occured was that sometimes the black pixels from the top edge of the piano formed a connection between black keys in the image This resulted in the algorithm detecting fewer than 15 black keys and failing. We fixed this issue by iteratively shrinking the crop of the image until the algorithm detected 15 black keys.
4. Segmenting white keys: We compute the axis along each black key using PCA on its constituent pixels. These axes cleanly divide the piano's white keys, allowing us to compute the pixels belonging to each white key.
5. Computing pixel centers of keys: Since the pixels belonging to each white key and black key are computed, we easily compute the pixels centers of each keys as the average of its constituent pixels.
The computed homography matrix H maps from physical coordinates to pixel coordinates. To get the physical coordinate of key k, take its pixel coordinate p and multiply by the inverse homography matrix to get the physical coordinate c = H^-1p. Since the four point homography assumes the physical scene is a 2D plane, the z coordinate of c is incorrect. Thus, the actual pose has coordinates (c_x, c_y, z_constant), where z_constant is the pre-measured height of the table in the correct frame.