NuCLS: A scalable crowdsourcing approach & dataset for nucleus classification, localization and segmentation in breast cancer

The NuCLS dataset contains over 220,000 labeled nuclei from breast cancer images from TCGA. These nuclei were annotated through the collaborative effort of pathologists, pathology residents, and medical students using the Digital Slide Archive. These data can be used in several ways to develop and validate algorithms for nuclear detection, classification, and segmentation, or as a resource to develop and evaluate methods for interrater analysis.

Data from both single-rater and multi-rater studies are provided. For single-rater data we provide both pathologist-reviewed and uncorrected annotations. For multi-rater datasets we provide annotations generated with and without suggestions from weak segmentation and classification algorithms.

For more details consult our GigaScience paper, or contact us directly with questions.

Related: If you like this work, you will probably be interested in our 2019 region crowdsourcing paper and dataset.

NuCLS figures and tables