The evaluation will be made by comparing the participant system outputs to the Ground Truth. Each task will have its own evaluation measures, as detailed hereafter. In summary, we will provide evaluation and measurements:
Download the example below : HERE
The script used to evaluate the participants' results is provided: https://git.univ-lr.fr/gchiro01/icdar2017/tree/master
Important notes :
Task 1) Error detection :
The detection task will be evaluated based on recall, precision and F-measure, as it is purely a matter of tokens being truly erroneous or not. The ranking will be made on the F-measure.
Task 2) Error correction :
As mentioned earlier, the correction task involves a list of candidate words for each error and will be evaluated on two different scenarios:
The chosen metric considers for every token, a weighted sum of the Levenshtein distances between the correction candidates and the corresponding token in the Ground Truth. So the purpose is to minimize that distance over all the tokens.