Motivation: Lung cancer is one of the leading causes of cancer mor- tality worldwide and non–small cell lung cancer (NSCLC) accounts for the most part. NSCLC can be further divided into adenocarcinoma (ACA) and squamous cell carcinoma (SCC). It is of great value to distinguish these two subgroups clinically.
Nowadays, Histopathology images serve as a golden standard for lung cancer diagnosis since they can provide a comprehensive view of the disease and its effect on human tissue. This figure shows some representative images of squamous cell carcinoma and adenocarcinoma. Currently, pathologists make diagnosis decision based on cellular and inter-cellular level morphology. Most of current pathology diagnosis is still based on subjective opinions of pathologists and the varying abilities of doctors could result in large interpretation errors or bias. The proposed framework, which focuses on automated quantitative analysis of histopathology im- ages, could alleviate the subjectivity in NSCLC diagnosis and pro- vides supports to doctors in lung cancer classification.
The process can be decomposed into eight main steps:
After convert the image to grayscale, perform local contrast adjustments so that we are able to extract the dimmer cells.
Objects on the borders can be caused by noise and other artifacts. So, here we also eliminate objects on the borders of the image.
Apply a 5-by-5 special adaptive filter to the image to minimize the effect of noise.
Extract the perimeters of cell or cell groups following a binarization technique.
After fill image regions and holes, we then perform morphological opening using a disc kernel.
Remove from the binary image all connected components (cells) that have fewer than 120 pixels.
Visualize the groups by finding their perimeter and overlaying it over the grayscale image.
Finally, apply the watershed algorithm on the image because several cells overlap.
The system consists of 3 modules:
Cell detection and segmentation
The process can be decomposed into eight main steps as shown above.
Feature extraction
Based on the segmented cell boundaries, seven cellular features are extracted for subsequent classification and prediction analysis. The seven features are red channel intensity, green channel intensity, blue channel intensity, gray intensity, red channel area, green channel area, and blue channel area.
Model training and evaluation
Linear and Gaussian kernel SVM classifier was trained on the extracted features. Precision, recall, and accuracy have been used as prediction performance metrics for classification.