The Avila data set
Download: link (the dataset is also available on UCI)
Data set description
The Avila data set has been extracted from 800 images of the “Avila Bible”, a giant Latin copy of the whole Bible produced during the XII century between Italy and Spain.
The palaeographic analysis of the manuscript has individuated the presence of 12 copyists. The pages written by each copyist are not equally numerous.
Each pattern contains 10 features and corresponds to a group of 4 consecutive rows.
The prediction task consists of associating each pattern to one of the 12 copyists (labeled as A, B, C, D, E, F, G, H, I, W, X, Y).
Data have been normalized by using the Z-normalization method and divided into two data sets: a training set containing 10430 samples and a test set containing 10437 samples.
Class distribution (training set)
A: 4286
B: 5
C: 103
D: 352
E: 1095
F: 1961
G: 446
H: 519
I: 831
W: 44
X: 522
Y: 266
Note that all the images of the Avila Bible are publicly available here
Attribute description
ID Name
F1 intercolumnar distance
F2 upper margin
F3 lower margin
F4 exploitation
F5 row number
F6 modular ratio
F7 interlinear spacing
F8 weight
F9 peak number
F10 modular ratio/ interlinear spacing
Class: A, B, C, D, E, F, G, H, I, W, X, Y
Citations
If you want to refer to the Avila data set in a publication, please cite the following paper:
C. De Stefano, F. Fontanella, M. Maniaci and A. Scotto di Freca, "A Method for Scribe Distinction in Medieval Manuscripts Using Page Layout Features", Lecture Notes in Computer Science, G. Maino and G. Foresti (eds.), Springer-Verlag, vol. 6978, pp. 393-402.
INFO & HELP
If you need any further information or help, do not hesitate to contact me, by sending an email to: fontanella at unicas.it