Participants annotated independently of each other. For the evaluation set, we used higher-quality suggestions, while the bootstrap control used lower-quality suggestions. Participants were not shown suggestions for the unbiased control.
A constrained clustering process was used to obtain the potential nuclear locations from multi-rater annotations. Then, an Expectation-Maximization statistical framework was used to aggregate opinion about specific nuclei, taking participant reliability into account. When the opinions of non-pathologists were aggregated, this was called the inferred NP-label. For pathologists, it was called the inferred P-truth.
> Click here to download the raw data (each annotator independently).
> Click here to download the inferred NP-labels.
> Click here to download the inferred P-truth.
40,028 annotations | 1,358 unique nuclei | 530 boundaries
> Click here to download the raw data (each annotator independently).
> Click here to download the inferred NP-labels.
> Click here to download the inferred P-truth.
19,881 annotations | 1,349 unique nuclei | 148 boundaries
> Click here to download the raw data (each annotator independently).
> Click here to download the inferred NP-labels.
> Click here to download the inferred P-truth.
37,434 annotations | 1,569 unique nuclei | 0 boundaries*
* By definition, we did not show participants any algorithmic suggestions in this control experiment. However, we did ask one practicing pathologist (SP.3) to manually trace all boundaries. All nuclear boundaries in FOVs prefixed by "SP.3_#_U-control_#_" are manually traced (1,223 boundaries).