Correction to BioNLP ST Infectious Diseases (ID) task CORE results (19th October 2011)

On 18th October 2011, an error in the original implementation of the Infectious Diseases (ID) task evaluation for the "core" task was discovered.

The error was in the generation of "core" subset of gold data and submissions, and caused the evaluation to mislabel "core" subset matches between submissions and gold data as false positive/false negative for some Binding events with additional Site arguments.

Additional arguments are not considered in "core" task evaluation, and this evaluation is implemented for ID using a separate processing stage that removes additional arguments and possible duplicate events arising from this change. However, for Binding events with multiple Theme arguments, one or more of which had a paired Site argument, the core subset generation software in some cases failed to remove all the Site arguments. This caused the primary evaluation script to fail do detect matches between the gold and submitted events for these cases, leading to an underestimate of performance for Binding events as well as for overall performance.

The corrected overall ID task core results are as follows (absolute F-score difference to originally published results in parentheses):

Additional ID task evaluation results

Results for CORE task, primary evaluation criteria


           gold (match)   answer (match)   recall    prec.   fscore
          ------------------------------------------------------------------------------------
FAUST      1369 (  696)     1049 (  696)    50.84    66.35    57.57   (+0.25% points)
UMass      1369 (  680)     1090 (  680)    49.67    62.39    55.31   (+0.25% points)
Stanford   1369 (  673)     1194 (  673)    49.16    56.37    52.52   (+0.32% points)
ConcordU   1369 (  697)     1607 (  697)    50.91    43.37    46.84   (+0.13% points)
UTurku     1369 (  537)     1076 (  537)    39.23    49.91    43.93   (+0.49% points)
PNNL       1369 (  402)      764 (  402)    29.36    52.62    37.69   (+0.0% points)
PredX      1369 (  324)      921 (  324)    23.67    35.18    28.30   (+0.0% points)

Please find the detailed corrected results linked from the ID task page.

The full set of originally reported results are preserved for reference below.

PLEASE NOTE: the results attached below include the error described above for the CORE task results. They are included for reference, but are superseded by the corrected results found on the ID task page.
ċ
ID-uncorrected-detailed-results-191011.txt
(48k)
Sampo Pyysalo,
Oct 18, 2011, 10:22 PM
Comments