After training our classification models, we compared the accuracy, precision and recall of the 5 models in order to determine to best combination of latent features and methods.
Accuracy is the probability of a classification being predicted correctly.
Calculating accuracy: number of correct classifications/total number of classifications
Precision is the probability that a positive prediction is correct.
Calculating precision: number of true positives / (true positives + false positives)
Recall is the probability that actual positives are classified correctly.
Calculating recall: number of true positives / (true positives+ false negatives)
True positives: predicted positives that were actually positive
False positives: predicted positives that were actually negative
False negatives: predicted negatives that were actually positive
Overall, all 5 of the classification models had results averaging around 98% for precision and 94% for recall, with the exception of the latent features from the autoencoder. Due to the poor results from the autoencoder latent features, we didn't use them for any plots or further testing. To further examine the accuracy of our models, we ran one more test using an infected vero cell to see if the models would predict the cell condition correctly (vero cells come from the African Green Monkey).
All five of our classification models predicted the vero cell sample condition correctly.
To identify the best model and latent features combination, we graphed two plots - one for precision vs recall of the test set, and one for the accuracy of the test set vs the accuracy for the training set.
This graph compared the accuracy of the training set versus the accuracy of the testing set for all the classification models. The two best results were:
The SVC model with the MyPCA features
The myCNN model with the embeddings
Code for plotting the accuracy graph:
This graph compared the precision and recall of the testing set for all the classification models. The two best results were:
The SVC model with the MyPCA features
The myCNN model with the embeddings
Code for plotting the precision and recall graph:
Our best model and latent features were the SVC model and the MyPCA latent features (90 features). The precision of this classifier was 99%, the recall was 97% and the accuracy was 98%. Based on these results for the test set, as well as this classifier being the best result on both graphs, we decided that the SVC model with the MyPCA latent features was the best classifier to use when classifying cell images as 'infected' or 'not infected'. It is crucial for COVID-19 testing to be very accurate, and the SVC model with the MyPCA features can provide that accuracy.