Case
Control
The original whole slide images (WSIs) were partitioned into tiles of 1024 × 1024 pixels with three color channels. Each tile was subsequently resized to a uniform resolution of 224 × 224 × 3 for computational analysis. During preprocessing, a region-of-interest (ROI) filtering algorithm was applied to identify and retain tissue-containing tiles while discarding background regions. All automatically selected ROIs underwent visual inspection to confirm annotation accuracy. Tiles demonstrating technical artifacts, including tissue folds, edge effects, or suboptimal scan resolution, were excluded through manual review of WSI patch location maps. Following this multi-stage curation process, 176,945 high-quality tiles were retained per class for downstream analysis.
Data Management Strategies:
The study included patients without known high-risk clinical factors for colorectal cancer (CRC) whose screening colonoscopies identified tubular adenomas with low-grade dysplasia. A total of 81 patients (41 male, 40 female; age range 54–95 years; mean = 70 years) met the inclusion criteria. None of the biopsies demonstrated histologic features indicative of high-risk progression to CRC at the time of sampling. Patients were stratified into two cohorts: a precancer group and a control group. The precancer group comprised individuals who later developed CRC following screening colonoscopies in which low-grade tubular adenomas were detected. The control group included individuals with no history of CRC despite one or more screening procedures revealing low-grade adenomas. Compared to the precancer group, control patients underwent a greater average number of biopsies and had a longer mean screening interval. On average, individuals in the precancer cohort were 6.86 years older than those in the control cohort. Histologic slides from both groups containing tubular adenomas with low-grade dysplasia were digitized using a Leica Aperio AT2 whole-slide scanner under identical imaging parameters.
Data Download
The data set is pre-split into training and testing images, with the following categories: case and control.
The data set is available for download. If you are interested, please complete the NPTA Download Request Form
We are profoundly indebted to Dr. Derrick Forchetti, Brandon Combs, and the rest of the South Bend Medical Foundation (SBMF) team for their critical support in providing access to the Neoplastic Tubular Adenomas dataset.
Special acknowledgment is due to the team of expert Pathologists involved in this work: Dr. Derrick Forchetti, Dr. Surendra P. Singh, and Dr. Ahmed Rahu. Their meticulous efforts in annotating the dataset and generously sharing their pathological expertise were crucial for accurately understanding the features inherent to the case and control groups.
Finally, We thank Brian Shula for his dedicated support in data curation and preparation, which significantly facilitated the execution of this study.
If you used our NPTA data set we would love to hear from you! Please contact Dr. Derrick Forchetti, MD for more information about this data at dforchetti@sbmf.org
Please consider sending us a message about your results and any feedback you have.
For additional questions about this data set or inquiries about commercial use, please contact dforchetti@sbmf.org and sultanaa3@udayton.edu
Citation Article:
A. Sultana, N. Abouzahra, A. Rahu, B. Shula, B. Combs, D. Forchetti, T. Aspiras, V. Asari, "UltraLight Med-Vision Mamba for Classification of Neoplastic Progression in Tubular Adenomas," NAECON 2025 - IEEE National Aerospace and Electronics Conference, Dayton, OH, USA, 2025, pp. 1-6, doi: 10.1109/NAECON65708.2025.11235447. .(IEEE ) (PDF)
Please acknowledge South Bend Medical Foundation (SBMF) too in your articles if you are using this dataset.