SOUNDS OUT OF PLÄCE? SCORE-INDEPENDENT DETECTION OF CONSPICUOUS MISTAKES IN PIANO PERFORMANCES
Additional Content
The purpose of this companion page is to:
1) Show a few labelled examples of PF, SR and Synthetic data.
2) Show examples of our model predictions.
3) Provide the BM data under CC BY-NC-SA
PF and SR Data Examples
To illustrate what is meant by a conspicuous error, we share a few audio snippets of labelled mistake regions from our gathered data.
PF Example 1
section without error label
section with error label
PF Example 2
section without error label
section with error label
PF Example 3
section without error label
section with error label
SR Example 1
section without error label
section with error label
SR Example 2
section without error label
section with error label
Synthetic Mistake Examples:
Model Predictions
We show a few piano-roll examples of the predicted labels for our 5 selected models (Baseline, SYNTH, SYNTH-FT, AE, AE-SYNTH). The topmost row is the ground truth label. On the right, we provide a description of the error mode encountered.
BM subset (Evaluation set)
b-07-annot.mid
False positive for inconspicuous Pitch Insertion
before 1450, An inconspicuous pitch insertion is detected by the SYNTH, SYNTH-FT, and AE-SYNTH models. It is sensible because this error mode is heavily present in synthetic mistakes. This kind of pitch insertion was detected by the score follower system.
Correct Prediction of Missed Notes
Missing note in a locally consistent rhythmic pattern is estimated as an error. This occurs twice between 1480 - 1570. However, only the second occurrence is detected by most models.
b-10-annot.mid
Strange rhythm is estimated as an error (frame 250-280)
In this example, there are 2 labelled portions very shortly spaced. This is a sensible annotation because they seem as two separate mistakes. However, it is worth highlighting that this is not always the case, and sometimes it is not clear when an error starts and ends.
All models (except AE) predicted two separately labelled regions.
b-02-annot.mid
Hitting adjacent keys is estimated as an error (frame 150)
However, it is not clear why the baseline model consistently predicting short mistakes between 50 and 100.
Abrupt silence, (potentially a hesitation) during a run is estimated as an error (frame 400)
More Failure Modes (False Positives)
Around frame 150 - as a flipside of detecting erroneous pauses in music performance, the system sometimes mistakes notated musical pauses as errors.
Around frame 1700 - the right hand contains three repetitions of a motif (ascending thirds). Some trained models consider this as an error, presumably since a common error mode in beginning pianist is to pause and repeat parts where a mistake has been made. Humans presumably disambiguate repetition as a result of composition versus errorneous performance by looking at a longer musical context and metric structure.
Around error 120-200 - a climactic held chord is detected as an error, presumably because a novice pianist has a tendency to hold the pressed keys when sightreading.
Around frame 2300 - This piece contains graceful ornaments (appoggiatura), which is often mistaken as errors. This happens presumably because hitting adjacent keys is a common error mode, so it is difficult to disambiguate between intentional hitting of the adjacent keys (ornaments) versus mistakes.
Observed Patterns in Model Predictions
Open Questions
Can there ever be a consensus on what is the 'correct' span of labels? When does a mistake start or end? Especially as sometimes the mistake is an absence of a note.
BM Data
BM data is available under CC BY-NC-SA
Download Link: https://drive.google.com/drive/folders/1J0eHOArbVu8utv_nfaZw5PcWS3atTScy?usp=sharing