HAMD 22 Competition - Evaluation & Ranking

Evaluation & Ranking

Participants will be asked to train their systems making use of provided training and validation sets. They are free to deal with the unbalanced nature of the dataset by applying any mitigation technique they find appropriate for the task; they are also free to resize the original images for standardization purposes. Participants will be asked to provide their trained systems in an executable form together with a comprehensive description of their systems including the architecture and the adopted preprocessing steps. Delivered programs should be able to accept and read a collection of images from a folder and produce a CSV file with two labeled columns. The first column should indicate the name of the tested sample (image name), and the second column should indicate the prediction of the system for each sample in the first column; predictions are values from 1 to 14 that indicate the predicted century number for each image, as described by the following table:

For simplicity, the names of samples included in the training and validation sets are coded as a sequence of 6 digit-numbers. The first two digits in the left indicate the century number (class value) of the sample; the subsequent four digits represent a sequential number associated to each sample. The entire structure of the adopted file name of samples is depicted in the following figure:

Returned data by participants allows the organization committee members to easily get the predictions of systems and hence properly evaluate their performances. Using the same test set, participating systems are compared and ranked according to two main criteria:

Correct dating of manuscript samples of the test set: making use of the following metrics:
1. F1-score,
2. MAE (Mean Absolute Error),
3. CS (Cumulative Score)
4. ACC (Classification accuracy [1]).

Average processing time.

Note that special interest will be given to the comparison of dating manuscripts with specific text formats such as tables, texts in margins with different styles, and hand-drawn figures. This enables the identification of the strengths and weaknesses of each system.