Artificial Intelligent
The accuracy of conventional diagnosis procedure for Leukemia may reduce by a certain factor such as tiredness and emotion of the expert as the diagnostic procedure is done manually by the hematologist or pathologist. Due to growing statistics and the important role of early diagnosis for chronic Leukemia, an automated intelligent screening system for chronic Leukemia is needed. The development procedure consists of five main stages, namely image segmentation, feature extraction, feature selection, classification and a Graphical User Interface (GUI). A total of 548 nuclei were extracted from the 100 images (50 samples for CML and 50 samples for CLL) and used for the analysis. The development procedure begins with the stage of Image Segmentation that involve the Colour Thresholding, Gradient Edge Detection and Convex Area Filtering to remove the artefacts and prepared the image for the feature extraction stage. There was a total of 28 features extracted from the geometrical, colour and textural features. In order to improve the performance of classifier, feature selection was applied to select some dominant features. Genetic Algorithm (GA), ReliefF (RfF) algorithm and Neighbourhood Components Analysis (NCA) algorithms were used for the feature selection. Based on the result, the selected features from RfF were able to provide the highest overall accuracy (99.4%). In order to ensure the reliability and accuracy of the selected classifier for this system, three types of classifiers were applied in the study. Optimization of each classifier was done before the selection of classifier to ensure each classifiers was fit with the data of chronic Leukemia. Weightage scoring method was implemented to select the best classifier amongst k-nearest neighbour (kNN), Support Vector Machine (SVM) and Multilayer Perceptron (MLP) network with Levenberg Marquardt (LM) algorithm to overcome the uneven contribution problem of the parameter. The classifier which gave the highest score will be selected as the classifier to be implemented in this screening system. Based on the result, MLP was selected in this study. The last stage in this study was creating the GUI which was done by the GUIDE in MATLAB R2017b.
1. The problem of Leukemia in Global and Malaysia is growing.
Leukemia & Lymphoma Society (LLS) showed that approximately every 3 minutes, 1 person in the United States (US) is diagnosed with a blood cancer. An estimated combined total of 174,250 people in the US are expected to be diagnosed with Leukemia, lymphoma or myeloma in 2018. Besides, statistics from MIMS Malaysia claims that the blood cancer is the fourth cancer in Malaysia in 2016. In addition, according to the World Life Expectancy, the death rate (per 100,000) of Leukemia in Malaysia is 4.18, it is corresponding to 75 ranks over 183 countries in the world.
2. The accuracy of conventional diagnosis procedure for Leukemia is uncertain.
The conventional early stage diagnosis procedure is performed manually by the human eyes of experts that the human expert needs to keep screening a thousand samples of blood slide under the microscope for a long period, hence the accuracy will be reduced by certain factors such as tiredness or emotion of human expert.
The whole study consists of 5 main steps, that are Image Acquisition, Image Segmentation. Feature Extraction. Feature Selection and Classification.
Image Acquisition
In this study, the slide image of chronic leukemia was provided by Hospital Universiti Sains Malaysia (HUSM). The samples of slide images were analyzed by Leica microscope at 40× magnifications that captured using Infinity 2 camera and saved into (.*bitmap) format at 800×600 resolution. A total of 100 slide images will be interpreted and analyzed in this study which were 50 sample slide images for CML while 50 sample slide images for CLL. Besides, a total of 548 cell nuclei were segmented from these slide images which were 291 cell nuclei for CML while 257 cell nuclei for CLL.
Image Segmentation
Image segmentation consists of two steps which are colour segmentation and filtering where colour segmentation consists of Colour Thresholding and Gradient Edge Detection while filtering consisted of Convex Area Filtering.
Colour Thresholding
Gradient Edge Detection
Convex Area Filtering
Feature Extraction
Feature extraction is a stage that transform the large input data into a reduced representation set of features. Different type of input will result in different type of feature. In this study, geometrical, colour and textural features were used to distinguish the CML and CLL. A total of 28 features were extracted from these categories.
Feature Selection
Feature selection algorithm is used to reduce the dimension of the feature space and improve the performance of classification. The feature selection algorithms that have been applied in this study included Genetic Algorithm (GA), ReliefF algorithm (RfF) and Neighbourhood Component Analysis (NCA) algorithm.
Optimization of GA
Optimization of RfF
Optimization of NCA
Selection of Feature Selection algorithm
After adjusting the optimal arguments and parameters, a total of 28 features were fed into the related algorithm and recorded the results accordingly.
Step 1: Select a classifier and fed the selected features into the classifier
Step 2: Record the evaluation parameter accordingly (accuracy for training, testing and overall with one decimal place) and tabulated in Appendix B
Step 3: Repeat step 1 – step 2 for another 9 times
Step 4: Calculate the average accuracies for each set of selected feature
Step 5: Compare the accuracies and the marks is given based on ranking concept (for example, 1 is refer to the lowest accuracy while 5 is refer to the highest accuracy)
Step 6: Calculate the total score of the each feature selection and select the feature selection algorithm with the highest scores.
Classification
In order to ensure the performance of the classifier in this system, three types of classifiers were chosen to be evaluated. This three types of classifier included k-Nearest Neighbour (kNN), Support Vector Support (SVM) and Multilayer Perceptron (MLP) network.
Division of Data
Optimization of kNN
Step 1: Run the kNN algorithm with k-value of 1
Step 2: Record the evaluation parameter (testing accuracy with one decimal place)
Step 3: Repeat the step 1 – step 2 with other k-value = 3, 5, 7, 9, 11, 13, 15
Step 4: Compare the testing accuracy for each k-value
Step 5: Select the k-value with highest testing accuracy
Optimization of SVM
Step 1: Start the optimization with the first parameter (Box Constraint) with the minimum value in the range
Step 2: Record the evaluation parameter (testing accuracy with one decimal place)
Step 3: Repeat step 1 – step 2 for another value in the range (for example, 0.01 for Box Constraint)
Step 4: Compare the testing accuracy for each Box Constraint
Step 5: Select the value with the highest accuracy
Step 6: Repeat the step 1 – step 5 for the next parameter (i.e. polynomial order if the kernel function is Polynomial)
Optimization of MLP
Step 1: Run the algorithm with the number of hidden nodes of 1
Step 2: Record the evaluation parameters (testing accuracy with one decimal place and Mean Square Error, MSE expressed in exponential form with six decimal place)
Step 3: Repeat step 1 – step 2 for another number of hidden nodes (5, 10, 15, 20, 25, 30)
Step 4: Compare and select the number of hidden nodes with the high testing accuracy and low MSE
*Same optimization steps for Learning Rate (0.1, 0.2, 0.3, 0.4, 0.5) and Number of Epochs (1, 5, 10, 15, 20)
Evaluation of Classifier
Image Segmentation
Feature Selection
Therefore, feature set of RfF Group 3 was selected and applied.
Classification
Then MLP is chosen.