The first case of new coronavirus disease was reported in Wuhan, China in December 2019. Since then SARS-CoV-2 or COVID-19 spread all around the world. By February 2021 more than 106 million confirmed cases and more than 2.4 million deaths were reported in the world (https://www.worldometers.info/coronavirus/).
This is a highly contagious respiratory disease which could be easily transmitted in the human body via mammals like bats (Fan et al., 2019). The common symptoms of COVID-19 are fever, cough, short breathing, sore throat, headache, and diarrhea (Singhal, T. (2020). Despite the invention and testing of a number of vaccines (Pfizer-BioNTech, Moderna, Sputnik V, CoronaVac) danger of the spread of the disease is still very high and we can't say when and if the disease will be cured completely. This is why detection of the COVID-19 remains extremely important.
Recent studies have shown that the use of imaging technology like X-ray or Computed Tomography (CT) to diagnose and screen COVID-19 can produce sensitive and reliable results (Ng M-Y et al,. 2020). In this project, I will be using only X-ray images because of it's higher availability.
Build a deep Convolutional neural network (CNN) to detect COVID-19.
Evaluate the effectiveness of state-of-art classification algorithms in order to detect COVID-19.
Evaluate the effectiveness of using the Quantum support vector machine activation function comparing to Softmax and Support vector machine in CNN.
Fig. 1. Total number of confirmed COVID-19 cases till February 2021.
Data from https://www.domo.com/covid19/
Fig. 2. Number of recovered and infected patience, number of deaths till February 2021. Data from https://www.domo.com/covid19/
Many research groups have been working on detecting COVID-19 from X-ray images.
To name just a few Wang and Wong (Wang et al., 2020) proposed a deep neural network model for COVID-19 detection (so-called COVID-Net), which obtained 92.4% accuracy in classifying normal, non-COVID pneumonia, and COVID-19 classes. Ioannis Apostolopoulos, et al. (Apostolopoulos et al., 2020) developed the deep learning model using only 224 confirmed COVID-19 images. The proposed model achieved 98.75% and 93.48% accuracy rates for two and three class problems. A recent paper by Tulin Ozturk et al. (Ozturk et al., 2020) proposed a deep CNN model based on the DarkNet model. It consists of 17 convolution layers with Leaky ReLU as activation. The model achieved an accuracy of 98.08% for the binary class problem and 87.02% for the multi-class. Panvar et al. (Panvar et al., 2020) proposed a deep learning model, nCOVnet to determine the COVID-19 disease using X-ray images. The model achieved an accuracy of 97.60% for the two-class problem.
As we can see from the past research both transfer learning and creation of own architecture can achieve significant results. This is why in this project I am planning to try to do both. The majority of recent researches used COVID-positive data from Covid-chestxray-dataset (Cohen et al., 2020), which I am planning to utilize in this project as well.
Fig. 3. Flowchart of the project.
For this project, I am using a combination of 2 widely used for COVID-19 related tasks Datasets. From the Stanford CheXert Dataset (Irvin et al., 2019), I will be taking chest X-ray images of healthy patients, from Covid-chestxray-dataset I will take all COVID-positive chest X-rays. Covid-chestxray-dataset is a recently created dataset of COVID-19 X-ray images and CT scans extracted from academic publications (Cohen et al., 2020).
Covid-chestxray-dataset consists of 930 images. It includes both X-ray images and CT scans of a number of different lung disease types (Viral, Bacterial, Fungal, Lipoid, etc.) and different Species (COVID-19, SARS, Streptococcus spp, etc.), images collected from different screening projections. Because of the problem that I work in this project, I will leave only Viral COVID-19.
The dataset is not well-balanced between X-ray and CT scan images so I decided to get rid of all CT scans leaving only X-ray images in the dataset.
Fig. 4a. The number of X-ray images and CT scans in Covid-chestxray-dataset.
Fig. 4b. The number of X-ray images after cleaning.
Only Posterior-Anterior (PA), Anterior-Posterior (AP), and Anterior-Posterior Supine (AP Supine) projection were kept because others are not suitable for our purpose (Minaee et al., 2020).
Images on the right illustrate Views distribution before and after cleaning the data.
Fig. 5a. View distribution in Covid-chestxray-dataset.
Fig. 5b. View distribution in after cleaning.
ChexPert dataset (Irvin et al., 2019), a large public dataset for chest X-ray images from Stanford consisting of 224,316 samples of 14 categories (no-finding, Pneumonia, Edema, etc.). For this project, only non-COVID images are used.
Fig. 6. View distribution in CheXpert dataset.
Non-COVID images present in two different screening projections AP and PA, which satisfy the project even if they are not well-balanced.
Fig. 7a. Age distribution in Covid-chestxray-dataset.
Fig. 7b. Age distribution in CheXpert dataset.
Age distributions in datasets slightly differ from each other, however, both of them look bell-shaped with a small right skew and don't require further normalization.
At the time of downloading Covid-chestxray-dataset consists of 478 COVID-positive images (PA, AP, and AP Supine views). In order to create a well-balanced Dataset, I am taking 478 Covid-negative images from Chexpert. Their samples can be seen in the next image.
Fig. 8a. Samples of COVID-positive images in prepared dataset.
Fig. 8b. Samples of COVID-negative images in prepared dataset.
Taking into account the existing limits of the quantity of training data, I will be using data augmentation technics to create a transformed version and increase the number of training samples. Precisely, I will use resizing and cropping. At the same time, existing image data can differentiate in size, contrast, color, and some other characteristics in images. To avoid this possible problem before training I will be using necessary transformations of Data (resizing, normalization, color jittering, etc.).
Neural networks in order to provide meaningful and comprehensive results require a lot of diverse data to train. In our case, the data is very limited so the best existing approach is to create new 'fake' data based on our real data.
The main idea is to use transformations of the existing data without applying any changes to the meaning of its content. The examples of transformation can be rotation or sheering of an image. PyTorch provides a number of transforms in the torchvision.transforms. Each epoch PyTorch will make the random transformation on a batch and feed them to the model, it will increase the number of images to train the model.
In this project, I am going to use resizing, color jitter, normalization, center crop, and some others.
Fig. 9. Applied transformations to the data.
Following two graphs represent how would the data look before and after the transformations.
Fig. 10a. Samples of test images before transformation.
Fig. 10b. Samples of test images after transformation.
In the field of medical imaging, a large dataset is often challenging to obtain. Because the number of images currently marked as COVID-19 is minimal (most of the images in the training data were obtained from medical papers) it is very difficult to train a model from scratch because it can easily start overfitting (Wang et al, 2020). Therefore, to overcome these obstacles I will mostly use: transfer learning strategy. I will be using models that have been already pre-trained on an extensive labeled dataset for a different task (for example ImageNet). By fine-tuning the parameters of the pre-trained networks, I will adapt them to the new task. I will fine-tune only the last layer of the convolutional neural network by removing the fully connected layer on the top layer of the pre-trained model, add a custom fully connected layer on the top layer, and then freeze the convolutional layer in front of the
network to train only the customized fully connected layer (Wang et al, 2020).
In this project, I evaluated the performance of six commonly used models: ResNet18, ResNet50, VGG16, DenseNet121, SqueezeNet, and AlexNet.
The following image shows the flowchart of deep learning models using transfer learning.
Fig. 11. Flowchart of deep models from transfer learning.
Also, I will build a model from scratch to see if it is possible to get close to the results of transfer learning models. I will call it Covid-ResNet.
ResNet-18 is 18 layers deep convolutional neural network. The network has been trained on ImageNet and can classify images into 1000 categories. The network has learned rich feature representations for a wide range of images. I will use only 2 output categories. The image input size is 224/224.
Fig. 12. ResNet-18 architecture.
ResNet-50 is a popular convolutional neural network containing 50 layers. It won the championship in the ILSVRC2015 competition. Its creatively proposed residual structure provides a simpler gradient flow and more effective training (Wang et al, 2020).
Fig. 13. ResNet-50 architecture.
DenseNet-201 is a relatively new and widely used network architecture. has 201 layers deep. It uses features to achieve better results and fewer parameters. It can directly connect all layers under the condition of ensuring the maximum information transmission between layers in the network. It also has been pre-trained on the ImageNet image database.
Fig. 14. https://www.researchgate.net/figure/A-visualization-of-the-DenseNet-201-architecture-Each-layer-takes-all-preceding_fig1_344143047
VGG16 participated in the ImageNet Large Scale Visual Recognition Challenge 2014 (ILSVRC2014) in 2014 and achieved excellent results. The model achieved 92.7% top-5 test accuracy. It consists of 16 layers and was trained for weeks using NVIDIA Titan Black GPU’s. The most unique thing about VGG16 is that instead of having a large number of hyper-parameter they focused on having convolution layers of 3x3 filter with a stride 1 and always used the same padding and Maxpool layer of 2x2 filter of stride 2. It follows this arrangement of convolution and max pool layers consistently throughout the whole architecture. In the end, it has 2 FC(fully connected layers) followed by a softmax for output.
Fig. 15. https://www.researchgate.net/figure/Fig-A1-The-standard-VGG-16-network-architecture-as-proposed-in-32-Note-that-only_fig3_322512435
SqueezeNet is a convolutional neural network that is 18 layers deep. It is a new version of AlexNet that achieved a 50× reduction in model size compared to AlexNet, while meeting or exceeding the top-1 and top-5 accuracy of AlexNet. A Fire module is comprised of: a squeeze convolution layer (which has only 1×1 filters), feeding into an expand layer that has a mix of 1×1 and 3×3 convolution filters.
Fig. 16. https://codelabs.developers.google.com/codelabs/keras-flowers-squeezenet/#6
AlexNet is a classic convolutional neural network architecture. It consists of 8 layers: convolutions, max pooling, and dense layers as the basic building blocks. Grouped convolutions are used in order to fit the model across two GPUs.
Fig. 17. https://livebook.manning.com/book/grokking-deep-learning-for-computer-vision/chapter-5/v-3/34
For this project, I decided to use a residual network which architecture can be seen on the next graph. The model has 2 paths in it. The short path has only one 1x1 Convolutional connection with batch normalization. Fewer operations lead to less chance of noise in the gradient, making it easy to propagate a useful gradient back farther than would otherwise be possible (Raff, 2020). The long path consists of a bottleneck architecture with 3 layers of convolutions. The model looks like a skip-connection where two layers will combine at the end, creating a long and short path. Then it sums up the outputs from both paths into the main output.
Fig. 18. Covid-ResNet architecture.
B - batch size;
C - number of channels;
W - width;
H - height;
k - kernel size
Fig. 19. Training results after 30 epochs.
Fig. 20. Precision / Recall results after 30 epochs.
0 - Covid-negative
1 - Covid-positive
Fig. 22. Validation accuracy of the models (30 epochs).
Predictions are being made on test data (not train or validation), data that the model has never seen before.
Fig. 23a. Application.
Fig. 23b. Application.
The only difference between a (linear) logistic regression and a (linear) SVM is the loss function. If we replace the cross-entropy loss with the Hinge loss (torch.nn.MultiMarginLoss), we get a SVM activation.
Fig. 24a. ResNet50 SVM / Resnet50 Logistic activation function.
Fig. 24b. ResNet50 SVM / Resnet50 Logistic activation function.
In the final part of this project, I am going to provide a quick comparison between classical and quantum computing and to show it in action using IBM Qubits and trier special library Qiskit. Let's start with the main differences and see why so many countries invest in quantum technologies billions of dollars right now.
Quantum computers process information in a different way cpmparing to classical. Instead of relying on transistors — which can only be either the '1' or the '0' at a single time — quantum computers use qubits, which can represent both 0 and 1 simultaneously (remember Schrödinger's cat).
Also, the power of a quantum computer can grow exponentially in relation to the number of qubits linked together. This differs from a classical computer, which sees its power increase in direct proportion to the number of transistors. This is one reason why quantum computers could eventually handle some types of calculations much better than classical computers. (https://www.cbinsights.com/research/quantum-computing-classical-computing-comparison-infographic/)
More intuitively you can see the differences in the image on the right.
Quantum SVM algorithm
A key concept in classification methods is that of a kernel. Data cannot typically be separated by a hyperplane in its original space. A common technique used to find such a hyperplane consists of applying a non-linear transformation function to the data. This function is called a feature map, as it transforms the raw features, or measurable properties, of the phenomenon or subject under study. Classifying in this new feature space – and, as a matter of fact, also in any other space, including the raw original one – is nothing more than seeing how close data points are to each other. This is the same as computing the inner product for each pair of data in the set. In fact, we do not need to compute the non-linear feature map for each datum, but only the inner product of each pair of data points in the new feature space. This collection of inner products is called the kernel and it is perfectly possible to have feature maps that are hard to compute but whose kernels are not.
The QSVM algorithm applies to classification problems that require a feature map for which computing the kernel is not efficient classically. This means that the required computational resources are expected to scale exponentially with the size of the problem. QSVM uses a Quantum processor to solve this problem by direct estimation of the kernel in the feature space. The method used falls in the supervised learning category, consisting of a training phase (where the kernel is calculated and the support vectors obtained) and a test or classification phase (where new data without labels is classified according to the solution found in the training phase).
Internally, QSVM will run the binary classification or multiclass classification based on how many classes the data has (https://qiskit.org/documentation/stubs/qiskit.aqua.algorithms.QSVM.html).
In order to provide an example of the use of QSVM I decided to take a different dataset (It wasn't possible to make this system work as an activation function, as it was my plan at first). For this problem, I am using a famous Breast cancer dataset. It has many features and their correlation can be seen on the plot next.
Fig. 25. Correlation matrix of BC-data features.
It is a linear problem and to make it work we need to conduct and Principle component analysis, leaving only the 2 most important components of the data.
Fig. 26. PCA on BC-dataset.
The QSVM algorithm takes the classical machine learning algorithm and performs the support vector machine on a quantum circuit in order to be efficiently processed on a quantum computer. For reference, I am applying both classical SVM and Quantum SVM on the same data. Results can be seen on the next plots.
Testing Accuracy: 75.0%
CPU times: user 7.43 s, sys: 196 ms, total: 7.62 s
Wall time: 7.53 s
Fig. 27. Kernel matrix QSVM.
Testing Accuracy: 95.0%
CPU times: user 169 ms, sys: 153 ms, total: 321 ms
Wall time: 211 ms
Fig. 28. Kernel matrix SVM.
As we can see for this problem classical computer provided with much better and even faster results. However, it doesn't mean that this is true for every situation. Quantum computing has the full potential to solve problems that classical systems will never be able to.
Apostolopoulos, I. D., & Mpesiana, T. A. (2020). Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks. Physical and Engineering Sciences in Medicine, 43(2), 635–640.
Ardakani, A. A., Kanafi, A. R., Acharya, U. R., Khadem, N., & Mohammadi, A. (2020). Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: Results of 10 convolutional neural networks. Computers in Biology and Medicine, 121, 103795.
Cohen, J. P., Morrison, P., Dao, L., 2020. COVID-19 image data collection. arXiv: 2003. 11597.
Fan, Y., Zhao, K., Shi, Z.-L., & Zhou, P. (2019). Bat Coronaviruses in China. Viruses, 11(3), 210.
Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., Marklund, H., et al., 2019. CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 590–597.
Jaiswal, A., Gianchandani, N., Singh, D., Kumar, V., & Kaur, M. (2020). Classification of the COVID-19 infected patients using DenseNet201 based deep transfer learning. Journal of Biomolecular Structure and Dynamics, 1–8.
Minaee, S., Kafieh, R., Sonka, M., Yazdani, S., & Jamalipour Soufi, G. (2020). Deep-COVID: Predicting COVID-19 from chest X-ray images using deep transfer learning. Medical Image Analysis, 65, 101794.
Ng M-Y, Lee EY, Yang J, Yang F, Li X, Wang H, et al. Imaging Profile of the COVID-19 Infection: Radiologic Findings and Literature Review. 2020; 2(1):e200034.
Ozturk, T., Talo, M., Yildirim, E. A., Baloglu, U. B., Yildirim, O., & Rajendra Acharya, U. (2020). Automated detection of COVID-19 cases using deep neural networks with X-ray images. Computers in Biology and Medicine, 121, 103792.
Panwar, H., Gupta, P. K., Siddiqui, M. K., Morales-Menendez, R., & Singh, V. (2020). Application of deep learning for fast detection of COVID-19 in X-Rays using nCOVnet. Chaos, Solitons & Fractals, 138, 109944.
Raff, E. (2020). Data 690 Deep learning lecture notes.
Singhal, T. (2020). A review of coronavirus disease-2019 (COVID-19). Indian Journal of Pediatrics, 87(4), 281–286.
Wang, D., Mo, J. , Zhou, G., Xu, L., & Liu, Y. (2020). An efficient mixture of deep and machine learning models for COVID-19 diagnosis in chest X-ray images. PLOS ONE, 15(11), e0242535.
Wang, L., Lin, Z. Q., & Wong, A. (2020). COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Scientific Reports, 10(1).
qiskit.aqua.algorithms.QSVM https://qiskit.org/documentation/stubs/qiskit.aqua.algorithms.QSVM.html
Quantum Computing Vs. Classical Computing In One Graphic https://www.cbinsights.com/research/quantum-computing-classical-computing-comparison-infographic/