Progress Report:

April 5, 2023

The team has continued developing its note and chord detection software in the past several weeks, leading to improvements in temporal and frequency resolution, greater clarity of notes of different volumes, and more accurate detection of notes. The following images and narration covers techniques the team has developed to improve note detection, and the report concludes with current challenges and next steps.

Project Progress:

The above figure demonstrates volume detection for "Twinkle Twinkle Little Star", used to locate notes. It can be observed that every spike in the plot corresponds to a note being played in the piece. Parseval's Theorem states that the sum of coefficients' squares is constant across different bases. Therefore, volume is calculated by summing the square of all DFT coefficients at each timestep of the spectrogram.

Below is a produced spectrogram of "Twinkle Twinkle Little Star". A lower-resolution of this spectrogram was presented in our previous report. Since that report, we have improved our temporal and frequency resolution by padding zeros to the end of each window. The team also experimented with padding zeros at the beginning of the window, as well as adding zeros between audio samples, but these different methods yielded the same results as end-padding zeros.

An issue with these spectrograms was that some notes are played louder than others, leading to some harmonic, 'fake' notes appearing more prominently than quiet 'intentional' notes. To address this issue, the team normalized each column individually. This way, the loudest note at a given time is always assigned a magnitude of '1'. To avoid division by zero (in the case of silence), notes only appeared if they met a volume threshold. The below figure shows "Twinkle Twinkle Little Star" with column-normalization and volume-thresholding applied.

Since the most prominent notes are now always assigned a value of exactly '1', it is possible to apply a very tight threshold to the spectrogram to filter out all but the most prominent notes. This allows for robust identification of the melody, as demonstrated in the figure below:

The melody for "Twinkle Twinkle Little Star" is rather trivial to identify, so the following figure demonstrates the improved algorithm evaluating the Bach Minuet:

The Bach Minuet is nearly perfectly detected, with only one stray note popping in around 9 seconds. The false-positive note is a harmonic of bass notes being played, which are not aimed to be detected in this plot. Rapid ornamentation around 5 seconds is resolved correctly due to the improved temporal frequency.

Histograms of pieces' normalized volume arrays are useful tools for setting the volume threshold. Below is such a histogram for the Bach Minuet. Most volumes are quite low, representing silence or near-silence before, during, and after the piece. The volume threshold has been set at 0.01 for melody detection, and this threshold is consistently effective across multiple pieces since the volume array is normalized between 0 and 1.

Current Issues:

The team is encountering issues distinguishing between 'real' notes and harmonics produced from lower notes. For example: when G3 is played, G4 will appear in the spectrum as well, often with significant magnitude. Since some harmonics end up being louder than some 'real' notes, simply thresholding the spectrogram will result in false-positives or false-negatives. 

The team is considering several approaches to the issue of harmonic rejection. One idea is to make a function that calculates a probability that a note is 'real', based on the presence of other notes. For example, the note G4 could be a harmonic of G3, if G3 is present at sufficient volume. Another idea is to ignore the issue of harmonics since harmonics appear at octave intervals, and thus would not change the detected chord. A third idea for addressing harmonics is to create notch filters at octaves from each bass note. For example, if G3 is present, a notch filter would be instantiated with the 'notch' at G4.

Next Steps:

What We've Learned So Far:

For most of our team members, this project has been eye-opening to the power of Python. Certain concepts, like execution of programs from the command line, have 'clicked' with team members as well. This is relevant to our project because, we are using our newfound Python skills to write code better and more quickly as the project develops. We are also coming to understand the power of open-source Python libraries, which enable us to perform high-level tasks more easily.