Additional Materials

Previous design iterations: Homography V1

Our first approach at finding the poses of the piano keys was to:

Compute the homography matrix using 4 AR tags (placed anywhere)
Find the pixel positions using computer vision
Convert pixel positions to poses with homography

1. Computing the homography matrix

To compute the homography matrix:

Used ar_track_alvar to find AR tags' poses and ID's: library functionality
Used aruco to find AR tags' pixel coordinates and ID's: https://docs.opencv.org/3.4/d5/dae/tutorial_aruco_detection.html
- Defined a custom AR tag set: aruco and ar_track_alvar use different sets of AR tags, so we used ar_track_alvar tags and defined them as a custom set in aruco
- Use aruco library functionality to detect the 4 corner points of each AR tag
- Used linear algebra to compute points at centers of AR tags: find intersection of the two quadrilateral diagonals. See image below for an example of correctly detected corners and centers.
- Use aruco to find ID of each AR tag (unreliable functionality! Prompted us to use another method)

Computed homography matrix as in lab 4 (diagrams taken from lab 4):
- The homography matrix H maps from physical coordinates (assumed to be in a 2D plane) to pixel coordinates
- First, solve for the entries of H using the system of equations Ax = b as in the diagram below. A known point refers to a pair consisting of physical pose (x, y) and pixel coordinate (u, v). Each known point provides 2 equations, and since the homography matrix has 8 entries/unknowns to solve for, this step requires 4 known points.

Once x is computed, rearrange its values into the homography matrix H as in the diagram below.

2. Computer Vision

The goal of the computer vision algorithm is to process the image below and compute the pixel positions for each key on the piano.

Cropping: We created a GUI where the user could select the 4 cornerpoints of the piano in the image, producing a cropped image.

2. Pre-processing: We transform the RGB image to grayscale then to black and white. Then we invert black and white so the normally black keys are white. This makes them "features" in the image.

3. Labeling features: We use the scipy label function to segment each feature (black key) in the image. One issue that occured was that sometimes the black pixels from the top edge of the piano formed a connection between black keys in the image This resulted in the algorithm detecting fewer than 15 black keys and failing. We fixed this issue by iteratively shrinking the crop of the image until the algorithm detected 15 black keys.

4. Segmenting white keys: We compute the axis along each black key using PCA on its constituent pixels. These axes cleanly divide the piano's white keys, allowing us to compute the pixels belonging to each white key.

5. Computing pixel centers of keys: Since the pixels belonging to each white key and black key are computed, we easily compute the pixels centers of each keys as the average of its constituent pixels.

3. Applying homography transform

The computed homography matrix H maps from physical coordinates to pixel coordinates. To get the physical coordinate of key k, take its pixel coordinate p and multiply by the inverse homography matrix to get the physical coordinate c = H^-1p. Since the four point homography assumes the physical scene is a 2D plane, the z coordinate of c is incorrect. Thus, the actual pose has coordinates (c_x, c_y, z_constant), where z_constant is the pre-measured height of the table in the correct frame.

Source code

https://github.com/smadan17/EE106A-Piano-Virtuoso

Report abuse