MIR-1K Dataset

Multimedia Information Retrieval lab, 1000 song clips, dataset for singing voice separation

Work by Chao-Ling Hsu and Prof. Jyh-Shing Roger Jang

The MIR-1K dataset is designed for the research of singing voice separation. MIR-1K contains:

  1. 1000 song clips which the music accompaniment and the singing voice are recorded at left and right channels, respectively.
  2. Manual annotations of the dataset include pitch contours in semitone, indices and types for unvoiced frames, lyrics, and vocal/non-vocal segment.
  3. The speech recordings of the lyrics by the same person who sang the songs are also provided in the dataset.
  4. The undivided songs of MIR-1K are now available for downloading.

    The song clip is named in the form "SingerId_SongId_ClipId".  The duration of each clip ranges from 4 to 13 seconds, and the total length of the dataset is 133 minutes. These clips are extracted from 110 karaoke songs which contain a mixture track and a music accompaniment track. These songs are freely selected from 5000 Chinese pop songs and sung by our labmates of 8 females and 11 males. Most of the singers are amateur and do not have professional music training.

 Labels for the unvoiced sounds  

    In MIR-1K, all frames of each clip are manually labeled as one of the five sound classes:

  1. unvoiced stop
  2. unvoiced fricative and affricate
  3. /h/
  4. inhaling sound
  5. others (include voiced sound and music accompaniment)

The length and the shift of the frame are 40 ms and 20 ms, respectively.

Sound demos for the unvoiced singing voice separation

    Sound Demos for Unvoiced Singing Voice Separation

Download MIR-1K dataset


Download MIR-1K dataset for MIREX

Relevant publications

[1] Chao-Ling Hsu,   DeLiang Wang, Jyh-Shing Roger Jang, and Ke Hu, “ A Tandem Algorithm for Singing Pitch Extraction and Voice Separation from Music Accompaniment,” IEEE Trans. Audio, Speech, and Language Processing,  2011 (Accepted)

[2] Chao-Ling Hsu and Jyh-Shing Roger Jang, “On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset,” IEEE Trans. Audio, Speech, and Language Processing,  volume 18, issue 2, p.p 310-319, 2010.

[3] Chao-Ling Hsu, DeLiang Wang, and Jyh-Shing Roger Jang, “A Trend Estimation Algorithm for Singing Pitch Detection in musical Recordings”, IEEE International Conference on Acoustics, Speech and Signal Processing, Prague, Czech, Mar. 2011.

[4] Chao-Ling Hsu, Liang-Yu Chen, Jyh-Shing Roger Jang and Hsing-Ji Li, “Singing Pitch Extraction From Monaural Polyphonic Songs By Contextual Audio Modeling and Singing Harmonic Enhancement”, International Society for Music Information Retrieval, Kobe, Japan, Oct. 2009.

[5] Chao-Ling Hsu and Jyh-Shing Roger Jang, “Singing Pitch Extraction by Voice Vibrato/Tremolo Estimation and Instrument Partial Deletion”, International Society for Music Information Retrieval, Utrecht, Netherlands, Aug. 2010.