New Methodology of Building Polyphonic Datasets for AMT


Annotated polyphonic datasets are very important for automatic music transcription (AMT) research. In comparison to the recent progresses of new AMT algorithms, the development of new and sizable datasets for AMT evaluation is quite slow and insufficient. This is because manual annotation of polyphonic music is a painful and time-consuming task, and current annotation methods cannot balance the trade-offs among generality, efficiency and cost. Under some cases the quality can even not be controlled since we have to rely on the inefficient manual method to check the quality. Whenever we wish to verify whether a dataset is good, we always fall into "the abyss of manual annotation".

Note-level annotation of polyphonic music is to find the onset time and the note name of every given note event in a music excerpt. This process is actually quite similar to the case when a musician plays music in an orchestra: the musician follows the mixture of music played by all other members in the orchestra, and plays his or her part together with all other members. When a musician is doing this, he or she is actually annotating (producing onsets and note names) his or her own part along with time. An interface (e.g., an electric piano with MIDI output) which transfers the musician's perception of note event and the behavior of playing into note-level annotation. A musician with both orchestra and piano playing experience would be able to do such note-level annotation by "playing with the music excerpt". Providing score sheets, the musician can further follow every parts of the music excerpts fluently and without significant errors.

Detailed procedures of the proposed musician-aided annotation method can be found in [1].

New dataset in MIREX 2015 (Su dataset)

In MIREX 2015 we proposed a newly annotated polyphonic dataset. This dataset contains a wider range of real-world music in comparison to the old dataset used from 2009. Specifically, the new dataset contains 3 clips of piano solo, 3 clips of string quartet, 2 clips of piano quintet, and 2 clips of violin sonata (violin with piano accompaniment), all of which are selected from real-world recordings. The length of each clip is between 20 and 30 seconds. The dataset is annotated by the method described in [1].

As also mentioned in the paper [1], we tried our best to calibrate the errors (mostly the mismatch between onset and offset time stamps) in the preliminary annotation by human labor. Since there are still potential errors of annotation that we didn’t find, we decide to make the data and the annotation publicly available after the announcement of MIREX result this year. Specifically, we encourage every participant to help us check the annotation. The result of each competing algorithm will be updated based on the revised annotation. We hope that this can let the participants get more detailed information about the behaviors of the algorithm performing on the dataset. Moreover, in this way we can join our efforts to create a better dataset for the research on multiple-F0 estimation and tracking.

Information of the 10 clips used in MIREX 2015 are listed below:

Number Category Composer Name mm.
PQ02 Piano quintet Elgar Piano Quintet in A minor, Op. 84, Mov. II 13 - 25
PQ03 Piano quintet Farrenc Piano Quintet No.1, Op.30, Mov.1 1 - 24
PS01 Piano solo Beethoven Piano Sonata No. 14, Op. 27, No. 2, Mov. 1 (Moonlight) 1 - 9
PS02 Piano solo Chopin Nocturne No. 9, Op. 32-1 1 - 8
PS03 Piano solo Mozart Piano Sonata No. 16, KV545, Mov.1 1 - 12
SQ01 String quartet Beethoven String Quartet No.14, Op.131, Mov.1 45 - 53
SQ02 String quartet Janacek String Quartet No. 1, Mov. 1 46 - 56
SQ03 String quartet Schubert String Quartet No. 14 in D Minor (Death and the Maiden) 15 - 28
VS01 Violin sonata Schumann Violin Sonata No.2, Op.121, Mov. 2 1 - 16
VS04 Violin sonata Franck Violin Sonata in A major, Mov. 4 1 - 20

Extension of the Su dataset (symphony and choir)

In our ISMIR 2016 paper on the transcription of very difficult music content like symphony and choir, we have built and announced 10 more clips using the same annotation, 5 of which are symphony and the other 5 are choral music:

SY4SymphonyTchaikovskySymphony No.6, Op.74, Mov.2 (Pathetique)A.9 - B.5 (35 - 42)
SY5SymphonyBerliozSymphonie fantastique Op.14, Mov.4 (March to the scaffold)54.1 - 54.12
SY6SymphonyMahlerSymphony No. 5, Mov. 41 - 18
SY7SymphonySibeliusSymphony No.5, Op.82, Mov.2
SY8SymphonySchubertSymphony No.8, D.759, Mov.2 (Unfinished)33 - 44
CH1SATB choirBach"Jesu, meine Freude" BWV 227 (SSA)1 - 9
CH2SATB choirBrahms7 Lieder, Op. 62, No. 1, Rosmarin1 - 9
CH3SATB choirMendelssohn6 Motets for Eight-part Choir, Op. 79, No. 1 (SSAATTBB)1 - 10
CH4SATB choirMendelssohnThe 100th Psalm1 -- 8
CH5SATB choirWilliam ByrdAve Verum Corpus5 -- 15

We provide the music clips and the ground truth annotation, including the .mid files, frame-level and note-level data following MIREX format of this dataset. Please send an e-mail to li.sowaterking[at] for the download link. If you have corrected the annotation and wish to update it, please also contact me with this e-mail.

How to cite

If you use this dataset in your work, please cite the following paper:

[1] Li Su and Yi-Hsuan Yang, "Escaping from the Abyss of Manual Annotation: New Methodology of Building Polyphonic Datasets for Automatic Music Transcription," in Int. Symp. Computer Music Multidisciplinary Research (CMMR), June 2015.
[2] Li Su, Tsung-Ying Chuang and Yi-Hsuan Yang, "Exploiting frequency, periodicity and harmonicity using advanced time-frequency concentration techniques for multipitch estimation of choir and symphony," in Int. Society of Music Information Retrieval Conference (ISMIR), Aug 2016.