Ten teams participated to all six sub-tasks and submitted a total of 31 runs.
You may check and evaluate your predictions:
The evaluation is described on the BioNLP-ST 2016 Bacteria Biotope page.
You can download all the charts and tables shown below (BB-rel, BB-rel+ner, BB-norm, BB-norm+ner, BB-kb, BB-kb+ner).
Submissions are compared to a simple baseline:
The confidence interval has been obtained by bootstrap resampling (n=100).
7 teams, 11 runs.
Recall, Precision and F1 for both relation types.
Recall, Precision and F1 for Lives_In relations.
The ticks on top of each bar indicates the score for relations that do not cross sentence boundaries.
Recall, Precision and F1 for Exhibits relations.
The ticks on top of each bar indicates the score for relations that do not cross sentence boundaries.
3 teams, 5 runs.
The Slot Error Rate (SER) is shown instead of F1, because substitution errors are penalized both in Recall and Precision.
SER is an error rate, therefore lower values are better.
Named-entities boundaries accuracy is measured by the Jaccard index.
Recall, Precision and SER for both relation types.
Recall, Precision and SER for Lives_In relations where the argument is of type Habitat.
The tick on each bar indicates the gain when entity boundaries accuracy is ignored.
Recall, Precision and SER for Lives_In relations where the argument is of type Geographical.
The tick on each bar indicates the gain when entity boundaries accuracy is ignored.
Recall, Precision and SER for Exhibits relations.
The tick on each bar indicates the gain when entity boundaries accuracy is ignored.
4 teams, 6 runs.
The result is the average distance between predicted and reference normalizations.
For Microorganism entities, a strict equality is used.
For Habitat and Phenotype entities, the Wang distance is used (w=0.65).
Average of strict equality of normalizations for Microorganisms entities.
Average Wang distance of normalizations for Habitat entities.
Average strict equality of normalizations for Habitat entities.
Average Wang distance of normalizations for Habitat entities. Only normalizations with concepts absent from the training and development set were considered.
Average Wang distance of normalizations for Phenotype entities.
Average strict equality of normalizations for Phenotype entities.
Average Wang distance of normalizations for Phenotype entities. Only normalizations with concepts absent from the training and development set were considered.
3 teams, 5 runs.
The Slot Error Rate (SER) is shown instead of F1, because substitution errors are penalized both in Recall and Precision.
SER is an error rate, therefore lower values are better.
Named-entities boundaries accuracy is measured by the Jaccard index.
Recall, Precision, and SER for all entities.
The score for each individual entity is the product of boundaries accuracy (Jaccard) and normalization (BB-norm).
Results for Microorganism entities only (Jaccard . Equality).
Results for Habitat entities only (Jaccard . Wang).
Results for Phenotype entities only (Jaccard . Wang).
Results for Microorganism entities boundary accuracy (Jaccard).
Results for Habitat entities boundary accuracy (Jaccard).
Results for Phenotype entities boundary accuracy (Jaccard).
1 team, 2 runs.
The evaluation emulates the capacity of systems to populate databases from a corpus. The pairs of database references (NCBI and OntoBiotope) are evaluated regardless of their text-bound anchors or of their corpus redundancy.
The Mean References is the average of the Wang similarity (w=0.65) of the OntoBiotope argument.