Application for Classifying Chest X-ray Abnormalities
BY: Vikram Saini
BY: Vikram Saini
Introduction
Data Science is growing for many reasons including more readily access to robust data sets, higher computing power, more data science libraries and more resources on how to conduct data science including Github repositories and the open source nature of the field. As its expanding, the utility of data science to various industries including consulting, finance and healthcare is becoming more apparent.
This project focuses on the utility of data science in healthcare, wherein we will look at how neural network performs on classifying chest x-rays first into normal and abnormal categories and subsequently into normal and various specific cases of abnormal scenarios. These abnormal categories include Cardiomegaly, Emphysema, Effusion, Hernia, Nodule, Pneumothorax, Atelectasis, Pleural thickening, Mass, Edema, Consolidation, Infiltration, Fibrosis, and Pneumonia.
I chose this project because I studied biology in undergrad so I have some foundational understanding in the healthcare area. I also believe that data science has the potential to make a significant contribution to healthcare. Specifically in this case, a project like this can be useful in helping radiologists make error-less diagnoses and treatments based on x-ray images. It will increase the confidence in a doctors' assessments, especially important in 'grey areas' for both the practitioner and the patient. Overall projects like this, in conjunction with radiologists input will improve the health outcome of patients and advance healthcare as a whole. It is important to emphasize that such a project would be a tool to guide/aid a radiologist and by no means be a replacement for one!
Similar Approaches
There are many Kaggle competitions on classification of x-ray images. These projects were an absolutely incredible resource for the project. Also stack overflow was immensely useful for working on bugs on the code. As such, some x-ray competitions/projects centered around bigger data inputs, while others focused on smaller data inputs. I wanted to work on one of the bigger data inputs, but I had to resort to a smaller data input after a lot of difficulty setting up the environment with ongoing issues. In fact, I spent a lot of time on this portion of the project before switching to the smaller dataset. In order to make the project more user-friendly for users, I setup a flask app where a person can input an x-ray image and it will categorize the image into one of the categories.
Introduction to the Data
2.1 GB total
5606 Chest x-ray images (in .png format)!
Size of 1024 x 1024
CSV file with class labels and patient data
Columns include file name, disease label (class label), pt ID, pt age, pt gender, x-ray orientation, original image width, original image length
Data source: National Institutes of Health Chest X-Ray Dataset. <https://www.kaggle.com/nih-chest-xrays/data>
EDA and Model Construction
Data Exploration
First the data was downloaded locally and then loaded. The labels categorizing the images from the CSV were loaded into a dataframe. The following output is the x-ray image with the accompanying label. A parameter is used to specify whether it was a binary classication (normal vs. abnormal or multiclass classification (which broke down the categories into specific abnormal classes).
Since this is a classification project, it is important that the different classes are equally represented. The reason this is important is because, for example say 90% of the x-rays were normal, if the algorithm falsely labeled all the data as normal, it would still achieve an accuracy of 90%! This would inaccurately assess the model. Moreover, since the algorithm didn't have the chance to look at the abnormal cases, it wouldn't have enough data to learn what abnormal is! Therefore it is important that the different classes are as equally distributed as possible. As it turns out, when broken into binary classification, the data is equally distributed between normal and abnormal.
Based on my research in deploying a Keras neural network model I found, I wanted to use ReLU as a function for most layers of the neural net, reserving Sigmoid function possibly for the last layer, and I wanted to use binary entropy and Adam optimizer. Avishreekh's github project used ReLU and then Sigmoid function for the last layer in conjunction with binary_entropy and Adam optimizer, and I found it as a useful source for the EDA and model construction part of the project. I used the code mostly throughout this process, because it was succinct and I understood what was happening. Avishreekh also used transfer learning and trained with Mobile Net without sampling which I didn't do which you can check out below. Avishreekh nicely created a generator for the train and validation split and modeled the data in binary form and multilabel with MobileNet which I used.
Github resource mentioned above: Avishreekh. (2020). Chest-X-Ray-Abornomality-Classification. <https://github.com/avishreekh/Chest-X-Ray-Abnormality-Classification>
However, when breaking classifying data into multiclass labels, the distribution is imbalanced.
At first the solution to class imbalance may seem simple- for instance lets remove data from the classes with more images. However this is almost never a good idea because we are losing valuable data. Instead we will weigh the classes with smaller distribution relative to those with classes that have a bigger distribution. This makes it so that the algorithm focuses (emphasizes) getting those heavily weighted classes correct.
In data science, cleansing and preparation is one of the most important steps. In computer science, there is the acronym- GIGO garbage in, garbage out. This stands for the idea that the output of any data project is only as good as the input. If the input is flawed, and steps are not taken to address the issues, the output will be less than ideal, or worse, useless. In this project, I didn't a whole lot to do for data preparation- the data was clean. I didn't have to worry how to deal with missing data.
Constructing the model
Image classification is most commonly done with neural networks. The reason being, classification of an image is not based on one specific factor but the entirety of the image as a whole. Neural networks is modeled based of neurons- and how they take in an input/ series of inputs and propagate the information to the next neuron. However its not this simple because one neuron receives input from many other neurons and then it goes on to send the output to one or more neurons. Another important point is that each neuron acts differently with a given input and this aspect of neurons is mimicked with parameters that can differ layer to layer within a neural network.
An important step to take before going any further with modeling is to split the data into training and testing. The reason this is important is so that we can fairly test the model performance on data it hasn't seen before, otherwise the model would have already learned the data and it wouldn't fairly assess its accuracy/performance. Usually 80% or 75% split for the training and the remaining 20% or 15% for testing the model.
Working with Keras
Keras is a popular library in Python for machine learning and neural networks because it supports so many different functions and makes it easy to conduct machine learning. In Your First Deep Learning Project in Python in Keras Step-By-Step, Brownlee breaks down the steps to creating a neural network with Keras. Brownlee mentions how sigmoid function used to be used as the activation function in all the layers before, but now it is usually reserved for the output layer and ReLU is used for the other layers because it leads to better performance. Brownlee also suggests using binary_crossentropy for binary classification and use of Adam as the optimizer, which he describes as a commonly used stochastic gradient descent that works well in a variety of settings.
Citation: Jason Brownlee. (2020). Your First Deep Learning Project in Python in Keras Step-By-Step. <https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/>
Working with Flask
Web app is an interactive way for non programmers to visualize and work with applications. In this project, I used Flask to create a a web app wherein a user can input a .png chest x-ray file and based on the Keras model created above, it will classify the image.
This was new to me because up until now, I worked mostly in jupyter notebook environment, but Flask requires you to work in the prompt for at least some portion of the coding. My first step was to download a project with that allows you to deploy a flask app with your own trained model. I found this project on Github and it was a useful resource to begin the process. After I downloaded the files. I went back to my notebook and saved the model as a .h5 file as instructed by the creator of the Github project. I went into the anaconda prompt, preceded to create a virtual environment for this specific project, and download the modules in the requirements. After that I got into the directory where the Github files were saved and then attempted to run the project with python app.py. I ran into different errors throughout the process with the modules downloaded and tried to play with them. I downgraded python version, as I saw that some of the other modules required older versions of python and tried to downgrade other libraries such as numpy- some of which didn't work. For instance for numpy- it would begin the process of downgrading after I searched conda for all the versions of numpy, but afterwards it would give a warning that no previous- included files found matching 'deps...'
Github resource referenced above: Mtobeiyf. (2020). Keras-flask-deploy-webapp. <https://github.com/mtobeiyf/keras-flask-deploy-webapp>
Execution and Interpretation
Planned Outcome: web application that can classify chest x-ray image input based on neural network model.
Unfortunately I couldn't get the web application to run as was the goal.
Model Performance
Accuracy curves are commonly used to assess model performance. I did use accuracy curves for this purpose. Dr. Murat also advised me to use recall metric to assess the model performance because it is more relevant in medical setting. Where accuracy metric falls is in the setting where there is a class imbalance and the model can achieve a high performance simply due to the fact that one class out weighs the other. For instance, in disease settings, alot more people don't have a specific disease than do, so it would be easy for an algorithm that even isn't that efficient to receive a high accuracy rate. Recall focuses on the positive cases and accuracy in determining those cases. This is alot more useful for this case, to emphasize the need to positively identify positive cases verses simply focusing on predicting negative cases.
Will Koehrsen. (2018). Beyond Accuracy: Precision and Recall. <https://towardsdatascience.com/beyond-accuracy-precision-and-recall-3da06bea9f6c>
As we can see from above, I got a recall of zero. Given recall is calculated by true positive / (true positive + false positive) most likely true positive was zero or very close to it. Generally we want recall to be above 0.5 so in that aspect, the model's performance needs some work.
Resource: Exsilio Solutions. (2016). Accuracy, Precision, Recall & F1 Score: Interpretation of Performance Measures. <https://blog.exsilio.com/all/accuracy-precision-recall-f1-score-interpretation-of-performance-measures/>
Confusion matrices are also a useful metric for measuring a model. A confusion matrix breaks down model performance in the following categories true positive, false positive, true negative and false negative. Dr. Murat also suggested this metric for the project and I do think it is a good metric for assessing the project performance.
The results of the confusion matrix are... well confusing. Somehow the matrix seems to imply a perfect algorithm, but based on accuracy we know this is not the case.
Limitations and further work
Due to how computational expensive it is to run a neural network, it was difficult for me to run the project and make revisions. Adding more data would help the performance of the model, but would be difficult to do given how time intensive the project was as it is now. Higher performance machine would make it easier to work with more data as would working with GPUs. It is important to note that more data would require not just more chest x-ray images, but chest x-ray images that are properly annotated by radiologists.
It would certainly have been more like a successful run if I was able to run the image classify with the trained model. I did learn alot along the way so that's a major plus!
Citations
Avishreekh. (2020). Chest-X-Ray-Abornomality-Classification. <https://github.com/avishreekh/Chest-X-Ray-Abnormality-Classification>
Mtobeiyf. (2020). Keras-flask-deploy-webapp. <https://github.com/mtobeiyf/keras-flask-deploy-webapp>
Paul Mooney. (2019). Predicting Pathologies in X-Ray Images. <https://www.kaggle.com/paultimothymooney/predicting-pathologies-in-x-ray-images>
Scott Mader. (2019). Train Simple XRay CNN. <https://www.kaggle.com/kmader/train-simple-xray-cnn>