home‎ > ‎

LITIS Rouen Audio scene dataset

                                                  
                  billiard pool hall                                                            kid game hall                                                                bus


  This audio scene dataset has been used for research purposes. They aim at studying features for audio scene recognition. Recordings have been performed in a unperfect realistic setting
  using a smartphone, and they have captured several classes of scenes.  Details of the classes and recording settings are depicted in our audio scene paper. 
 
  Here, we provide the dataset organized in examples of 30s length. In order to allow easy comparisons of research results, we also make available the 20-fold 80%-20% splitting of the examples
  into training and testing set as well as the 5-fold 50%-50% splitting of the learning examples into learn/validation set. For a sake of comparison, it would be great if everybody uses these splits and   the average mean precision. 

  • dataset (4.0 Gb)   (3026 examples - 19 classes - 30s for each clip) [data]
  • fold for the above data (in Matlab format)  [here] [here in Matlab v7 format]
    •  indiceM contains the 20 training/test split indices. The relation between example number and wav file is given in the file below 
    •  indicevalM, contains the 5-fold splits of each training set for CV ( indices are given with respect to the 2419 examples of the training set)
    • the folds are also available below in text format [learn] [test] [cv]
  • mapping filename to example number [here]
    • this file contains the mapping of the example file names (e.g avion1, avion2, .... tubestation200..) to the example number (1 to 3026) in the train/test folds. For instance, bus73.wav is the example numbered 187 and train-ter57 is the 2708.  
 
The classes are organised in the following way

 #       filename                classes
       avion                     plane
2        busystreet              busy street
3        bus                        bus
4        cafe                       cafe
5        car                         car
6        hallgare                  train station hall
7        kidgame                 kid game hall
8        market                   market
9        metro-paris            metro-paris
10      metro-rouen           metro-rouen
11      poolhall                  billiard pool hall
12      quietstreet             quiet street
13      hall                        student hall
14      restaurant              restaurant
15       ruepietonne           pedestrian street
16       shop                     shop
17       train-ter                 train          
18       train-tgv                high-speed train
19       tubestation            tubestation  

Note that we are still recording scenes so the dataset is likely to be enriched in the near future. 

The classes have been defined according to the location where the audio clip has been recorded, but some classes can be ambiguous. For instance, a quiet street is defined as a street with no or very few pedestrians or cars, however, in some recordings, motor engines can be heard as cars passed by as the recording started. 

Below we also provide, the relation between each examples of the dataset and  audio clip files that are usually longer than 30s [here] and if you need some others versions (say 10-second version) or have any suggestions, do not hesitate to drop me an email.



If you use this dataset in any of your publication, please refer to the following work
  • A. Rakotomamonjy, G. Gasso, Histogram of gradients of Time-Frequency representations for audio scene detection,  Technical report, HAL, 2014

I intend  to keep track of the published results. At the present time, we achieve a mean average precision of 0.914.
There is a lot of papers using this dataset. Shortly, I will reference them and report results. 


This work has been partially supported by the grant ANR 12-BS004
ċ
fold3026-matrices_Learn.txt
(756k)
alain rakotomamonjy,
Sep 16, 2015, 1:48 PM
ċ
fold3026-matrices_Test.txt
(190k)
alain rakotomamonjy,
Sep 16, 2015, 1:48 PM
ċ
fold3026-matrices_v7.mat
(381k)
alain rakotomamonjy,
Sep 16, 2015, 1:52 PM
ċ
fold3026.mat
(2278k)
alain rakotomamonjy,
Sep 8, 2014, 2:53 PM
ċ
fold_CV.tar.bz2
(272k)
alain rakotomamonjy,
Sep 16, 2015, 1:48 PM
ċ
relation_examples_files.txt
(164k)
alain rakotomamonjy,
Aug 27, 2014, 6:21 AM
ċ
relation_wav_examples.txt
(63k)
alain rakotomamonjy,
Sep 16, 2015, 1:18 PM