Data Sets

MIL Data Sets for Witness Identification

These data sets were used in experiments comparing the ability of different MIL algorithms to correctly identify witnesses in positive bags in:

[1] M.-A. Carbonneau, E. Granger, and G. Gagnon, “Witness Identification in Multiple Instance Learning Using Random Subspaces,” in Proceedings of the 23rd International Conference on Pattern Recognition (ICPR), 2016.

Mini-MIAS MIL Data Set of Mammograms:

This data set was created using the images of the The mini-MIAS database of mammograms. The database contains images of healthy patients, as well as patients exhibiting 1 of the 6 classes of abnormalities. For each abnormality, an image patch is extracted using the location annotations provided with the data set. These patches are positive instances, and negative instances are patches of various sizes extracted from tissue regions not intersecting with abnormalities regions, or from tissue regions belonging to healthy patients. Each patient is represented by a bag containing 10 patches. Because negative patches are extracted randomly, 5 versions of the data set are generated. The data set contains a total of 326 subjects, among which there are 117 subjects presenting abnormalities. Features are extracted from each patch. The feature vector contains the mean and standard deviation and a normalized 12-bin frequency histogram of the pixel intensities contained in the patch. This representation is augmented with the mean local binary pattern (LBP) extracted from a 13x13 pixel grid, and with the mean of densely extracted SIFT descriptors. Finally, the 5 Haralick features are also used. The resulting 220-dimensional vectors are reduced to 100-dimensional vectors using PCA.

[get the data set]

Letters MIL data set:

This data set is created using the Letter Recognition data set. It contains a total of 20k instances of the 26 letters in the English alphabet. Each letter is encoded by a 16-dimensional feature vector. The reader is referred to the original paper for more details. A MIL version of the data set is created by grouping letters in bags. This allows control over witness rate (WR) and the number of positive concepts, which in this context, correspond to the different letters. A first collection of data sets is created by varying the number of positive concepts from 1 to 10. Each time a data set is generated, random letters are designated to be positive concepts, and all others are assigned to negative concepts. All bags contain 10 instances, and positive bags contain 2 instances from randomly selected from the positive concept. A second collection of data sets is generated to assess the effects of WR. The positive class is composed of 3 randomly selected concepts. Each bag contains 10 instances, and the number of witnesses in positive bags is determined by the WR. All data sets contain 100 positive and 100 negative bags. For each configuration, 10 different data sets are generated.

[get the data set]

Synthetic Data Sets for MIL

These data sets are used to assess the performance of MIL algorithms at different witness rates, levels of positive class complexity and number of noisy features. The data set was introduced in:

[2] M.-A. Carbonneau, E. Granger, A. J. Raymond, and G. Gagnon, “Robust Multiple-Instance Learning Ensembles Using Random Subspace Instance Selection,” Pattern Recognition, vol. 58, pp. 83–99, 2016.

Here's the link to get the data:

[get the data set]

or alternatively:

[also the data set]

ÉTS Hockey Event Data Set:

This data set contains footage of two hockey games captured using fixed cameras. Temporal annotation are provided for shots, goals, face-offs, line changes, saves, and checking.

The data set was introduced in:

[3] M.-A. Carbonneau, A. J. Raymond, E. Granger, and G. Gagnon, “Real-time visual play-break detection in sport events using a context descriptor,” in Circuits and Systems (ISCAS), 2015 IEEE International Symposium on, 2015, pp. 2808–2811.

The data set can be downloaded at:

https://www.etsmtl.ca/Professeurs/ggagnon/Projects/ai-sports