Deep Learning-Based Breast Cancer Detection from Histopathology Images

Abstract

Breast cancer is a life life-threatening disease affecting millions of women globally. Breast tissue is incredibly complex in its structure. It comprises a mixture of fatty tissue, glandular tissue, and connective tissue, all intricately organized. Breast tissue composition varies greatly from person to person and even within the same individual over time. This variability can make it difficult to distinguish between normal and abnormal tissue, especially in cases of early-stage cancer. Detecting breast cancer from histopathology images requires an extensive expertise from pathologists. Pathologists must thoroughly analyze multiple sections of tissue to ensure an accurate diagnosis, adding to the overall complexity and tediousness of the process. To address this, a lot of deep learning-based models, especially CNN models have been proposed, some of them face overfitting issues while others offer limited accuracy. This research aims to enhance breast cancer detection by leveraging advanced deep learning architectures: U-Net for segmentation, ResNet for refinement, and denseNet for classification. By integrating these models, the study seeks to overcome limitations of overfitting while offering high accuracy and comparatively low computational cost. The methodology involves data preprocessing, image annotation, and applying attenuation masks for contrast adjustment. Segmentation will be performed using U-Net and ResNet to accurately delineate cancerous lesions and capture detailed boundaries. The segmented regions will then be classified using AlexNet and VGGNet, with fine-tuning to enhance accuracy. Evaluation will be conducted using metrics such as overall classification accuracy, F1-scores, Intersection over Union (IoU), and Dice coefficient. The BRACS and BreakHis datasets, containing annotated histopathology images, will be utilized for segmentation and classification tasks, respectively.

Introduction

Breast cancer is an abnormal growth in breast cells. It poses a significant health risk globally. It Originates from the milk-producing ducts or glandular tissue of the breast, if left untreated it can metastasis to other body parts. Breast cancer has two major types, Invasive and non- invasive, these are basically two different stages of breast cancer. Non-invasive carcinoma originates in the milk ducts and has not spread yet to the surrounding tissues. Similarly, Breast Cancer is called Invasive when cancerous cells have spreaded to other tissues, usually through the lymphatic system. Breast tumor is also divided into various tumor types I-e normal tissue, proliferative benign (PB) lesions, usual ductal hyperplasia (UDH), flat epithelial atypia (FEA), atypical ductal hyperplasia (ADH), ductal carcinoma in situ (DCIS), and invasive carcinoma (IC). These tumors are classified into two benign and malignant. Benign tumors are non- cancerous tumors and are further divided onto fibroadenoma, phyllodes tumor, and tubular adenoma. Malignant tumors are cancerous and requires immediate treatment. Malignant is also further divided into ductal carcinoma, lobular carcinoma, mucinous carcinoma, and papillary carcinoma.

Invasive carcinoma is the most common type and has a high fatality rate. While breast cancer usually affects women however 0.5% to 1% of all cancer cases are of men. It was estimated that by the end of 2024 310,720 women and 2,800 men will be diagnosed with invasive breast cancer. Among women, after lung cancer, breast cancer is among the most diagnosed cancer. In 2022, most cancer cases diagnosed in women were breast cancer cases.

Like every other country breast cancer is very common in Pakistan as well. Pakistan is the one with the most cancer cases among Asian countries. In 2018, 34,066 cases of breast cancer were diagnosed in Pakistan. Breast cancer is among the diseases with the highest infertility rate. In 2022 alone, breast cancer caused around 670,000 deaths globally. Early breast cancer detection can increase the chance of survival. It can increase the 5-year relative rate of survival by up to 99%.

Breast cancer is detected in several different ways. Image-based diagnosis is the main form of diagnosis. Image Modalities used to detect breast cancer include mammography, Breast ultrasound, Histopathology, MRI etc. Mammography is widely used around the globe. The results obtained from mammography and other modalities are usually confirmed through biopsy/histopathology. It serves as the gold standard for breast cancer detection. Histopathology is a microscopic analysis of tissue samples obtained from biopsy. By looking at histopathological images, histopathologists can determine the presence of type, grade and stage of cancerous tumors. However, Histopathology often face challenges due to the subjective interpretation and the shortage of experienced pathologists in some regions. Automatic breast cancer detection methods are being developed to overcome this issue. Furthermore, within the realm of deep neural networks, Convolutional Neural Networks (CNNs) emerge as pivotal tools in the detection of breast cancer. From histopathology images CNNs can extract intricate patterns and textures indicative of cancerous cells or abnormalities within breast tissue. CNN based architectures such as ResNet, UNet, AlexNet, VGG-16, and VGG-19, have showcased exceptional performance in various medical imaging tasks, including breast cancer detection, using different image modalities. These networks leverage have capability of doing precise identification and characterization of cancerous lesions using intricate layers of interconnected nodes to extract hierarchical features from medical images including histopathology.

Our aim is to leverage the capabilities of deep learning architectures for rarely breast cancer detection from histopathology images. Segmentation is a crucial step in identifying and delineating regions of interest within histopathological images. UNet's encoder-decoder structure is specifically tailored for biomedical image segmentation tasks and promises accurate localization of cancerous lesions. Our focus will be to perform segmentation using the UNet architecture and ResNet will be employed in its encoder part to further enhance the segmentation process. ResNet will be used because its residual learning framework is best at capturing intricate details and boundaries of abnormal tissue regions. To train our segmentation model we will use BRACS(Breast Cancer Subtyping) dataset. BRACS dataset [11] consists of 547 Labeled Whole Slide Images and 4539 Labeled Regions of Interest collected from 189 Patients. Following segmentation, the extracted regions of interest will undergo classification using deep learning CNN based architectures. To train our classification model we will use BreakHis dataset [12]. It consists of 2,480 benign and 5,429 malignant samples.

Problem Statement

Development of a novel framework, UtopiaNet, for delineating and segmenting cancer nuclei from histopathology images, followed by classification of these nuclei into different breast cancer types using CNNs. UtopiaNet incorporates U-Net and ResNet to enhance segmentation accuracy, and DenseNet for classification, aiming to achieve higher accuracy with reduced computational requirements.

The primary purposes of this research are as follows:

Design of a novel hybrid deep learning framework that combines the strengths of deep neural architectures for the accurate detection of breast cancer from histology images.
Explore effective preprocessing techniques and data augmentation strategies, by conducting an ablation study, to enhance the robustness and generalizability of the hybrid model.

Datasets

There are not many datasets on Classification and segmentation of Breast cancer form histopathology images. We will be using BRACS for the segmentation of different tumors cells type and BreakHis for classification of tumor into different cancer type. BRACS consists of Whole-slide images stored in the .svs file format, as multi-resolution pyramid structures. Images are annotated with seven different tumors classes i-e Normal (N), Pathological Benign (PB), Usual Ductal Hyperplasia (UDH), Flat Epithelial Atypia (FEA) and Atypical Ductal Hyperplasia (ADH), Ductal Carcinoma in Situ (DCIS) and Invasive Carcinoma (IC) lesion.

BreakHis dataset will be used for classification purpose. BreakHis consists of histopathology slides of 82 patients at 100x, 200x, 300x and 400x magnification levels. The slides are divided into two main groups i-e benign tumors and malignant tumors. Both of these are further divided into four groups. Benign is divided into adenosis, fibroadenoma, phyllodes, tubular_adenoma. Similarly, malignant tumor is divided into ductual carcinoma, lobular carcinoma mucinous carcinoma, papillary carcinoma.

Research Methodology

Pre-processing

We will convert the BRACS classification dataset into a semantic segmentation dataset by annotating the images with segmentation masks. The BRACS dataset that we are going to use to train our segmentation model has annotations in .qpdata format, which can only be viewed in Qupath [29]. QuPath is an open-source software specifically designed for the analysis of whole slide images (WSI) in digital pathology. To convert .qpdata to binary masks, the whole slide image will be loaded into Qupath along with annotations. We will then run a script to export annotations (.qpdata) as binary masks (.png). Attenuation masks will be applied to WSI images to modify the contrast of certain regions without affecting the overall structure and integrity of the image. We will depict normal tissue areas in green and tumor tissue areas in red. These masks will then be blended with the original images using a weighted combination. This adjustment will help our segmentation model to analyze and interpret the images accurately.

Segmentation

For identifying and delineating regions of interest we will employ the U-Net and Res-Net architectures. The annotated images will be fed into these networks. U-Net's encoder-decoder structure ensures accurate localization of cancerous lesions while Res-Net will with its residual learning framework, is best at capturing detailed boundaries of abnormal tissue regions. We will combine these two architectures to utilize the efficiency of both U-Net and Res-Net. We will integrate ResNet as an encoder part of the U-Nets architecture. Together these two will result in an enhanced segmentation process.

Classification

After segmentation, the results will be passed to the classifier, which will perform binary classification of the tumor as either benign or malignant. Following this, a four-class classification of benign and malignant tumors will be conducted using a DenseNet model. We will fine-tune DenseNet on the BRACS dataset to achieve more accurate classification results.

Methodology Diagram

Page updated

Google Sites

Report abuse