AI Telehealth

Project AI TeleHealth

The TeleHealth project, supported by the Digital Systems Design Laboratory of NAU, aims to improve the health industry. Reviewing and diagnosing ailments properly takes time. This limits the number of patients that can be treated effectively in a day. Moreover, in the field of remote medical diagnosis, safeguarding sensitive information is paramount. Project AI TeleHealth is dedicated to developing an accurate AI-assisted diagnostic system that not only enhances diagnostic accuracy and efficiency but also ensures the privacy of data transmission.

NEWS:

The final version of our paper now is currently published at https://doi.org/10.1016/j.health.2024.100332!
Grad-CAM: Wonder how AI makes decisions to diagnose your disease and how our COVID-19 diagnosis model works? No worries! We make this visualization online to explain the model's focus through the powerful animint2 package!
Please let me announce the most important news with the greatest of excitement: Our research paper: A Vision Transformer Machine Learning Model for COVID-19 Diagnosis Using Chest X-Ray Images has finally been accepted by the Journal: Healthcare Analytics on Sunday, April 7, 2024 1:05 PM after tremendous meticulous modifications!
We made our COVID-19 "intelligent diagnoser" online! Users can access our free cutting-edge COVID-19 detection (model) by uploading their chest x-ray images. Click here to have a glance at our snapshot edition (This service will be held from Monday-Friday 10:30 AM-8:00 PM AZ time).
Our paper link: High-accuracy fine-tuned vision transformer model for diagnosing COVID-19 from chest X-ray images. (Submitted to Biosensors and Bioelectronics: X Journal in Dec 2023, currently in preprint state) backup link if techrxiv is under a bad connection. (Unfortunately, it's been sent back because of "no reviewer found". We guess that it's free of charge during our submission time contributed to this )
We got the third prize in the Engineering Festival of Northern Arizona University!

Our Designated AI Telehealth System

How does our project start?
Chest X-rays are a crucial tool for diagnosing various illnesses. However, their current use faces limitations in efficiency, particularly during critical situations like pandemics. The delay of 1-2 days between scans and diagnosis can significantly impact patient outcomes. This challenge is particularly evident with diseases like COVID-19, which has affected over 772 million people globally, with over 7 million fatalities. The COVID-19 pandemic resulted in a significant economic burden, with a 3.3 trillion dollar deficit in the US for 2020 and a peak unemployment rate of 14.7%. Medical institutions faced surging demand, leading to longer wait times and exacerbated disparities in access to care. Studies found a concerning 117% increase in wait time disparities. To address these challenges, a critical re-evaluation of current diagnostic approaches is essential. Embracing innovative solutions and integrating technology are key to improving efficiency, minimizing delays, and optimizing healthcare outcomes. The critical need for speed and accuracy in medical diagnosis has driven researchers to explore the transformative potential of artificial intelligence (AI), particularly advanced deep learning networks.

Artificial Intelligence Aided Diagnozing

Our Proposed Model: fine-tuned vision transformer (FT-ViT)
Vision transformer (ViT) models stand apart from CNNs due to their unique mathematical architecture and foundational blocks. The primary distinction between the ViT structure and traditional CNN architecture lies in ViT’s extensive global receptive field, where the receptive field is the region that computer vision models can understand. By leveraging a global receptive field, ViT can capture relationships between distant (non-adjacent) image patches, potentially leading to superior performance in image classification tasks. Our proposed model is transfer-learned on pre-trained model vit-base-p16. The structure is shown at left.

Data Set

To have a deep learning that can truly learn, it needs a reasonable data set to create a train, test, and validation set. In medical fields, this would include lung ailments, heart diseases, brain trauma, muscle conditions, and the list goes on. In this research, we started with the COVID-19 chest X-ray dataset, which correlates to Normal, COVID-19, Viral Pneumonia, and Lung Opacity. Our full data set includes more than 20,000 scan images with 299 by 299 resolutions and can be altered to fit different learning models. For example, here is the exact breakdown of our dataset with an example of each:

10,192 Normal lung scans ( A )
3,616 Covid lung scans ( B )
6,012 lung opacity scans ( C )
1,345 viral pneumonia scans ( D )

Image Preprocessing

It is important to address potential data bias for training machine learning models. In our research, we tried to solve this by preprocessing the images in several ways. First, we normalize the original pixel values from [0, 255] to [0.0, 1.0]. This normalization step facilitates the training process by ensuring all pixel values are on a similar scale. Second, we resize the images to 224 by 224 pixels to match the input requirements of the pre-trained models we employ for transfer learning. Second, gamma correction and segmentation are used. In short, gamma correction is the act of manipulating the brightness of each pixel in an original image. This either brightens or darkens the image. It is possible to let models better "see" odd imperfections or cloudiness in the lung. Segmentation is meant to help the system focus on a specific part of the picture it needs to learn from, through masking. In this case, it is the lungs. Here is a graphic that can help better explain both methods.

Other Experimented Model Structures

Machine learning models can be viewed as a long series of mathematical formula combinations. By feeding the dataset into models, it can finish the tasks that human intends. Below are other model structures experimented with in our research. Basically, they can be classified as transformer-structured and convolutional-neural-network-structured.

Multiscale Vision Transformer

The Multiscale Vision Transformers (MViT) structure utilizes a hierarchical architecture that progressively increases channel dimensions and decreases spatial resolution, employing self-attention mechanisms that adapt to varying scales and complexities within the data.(MViTv2: Improved Multiscale Vision Transformers for Classification and Detection)

Efficient Vision Transformer (EfficientViT)

EfficientViT is a new family of high-resolution vision models that utilize multi-scale linear attention for dense prediction tasks, offering a more efficient alternative to traditional models which often require heavy computational resources.(EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction)

EfficientNet-B0

EfficientNet is a series of convolutional neural network models. It employs a unique scaling method that simultaneously adjusts the depth, width, and resolution of the network based on a set of fixed scaling coefficients. This approach let it have less computational cost (EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks)

DenseNet121

DenseNet121 is a convolutional neural network that consists of 121 layers, featuring a densely connected architecture where each layer receives inputs from all preceding layers, facilitating feature reuse and reducing the vanishing gradient problem, making it highly effective for tasks like image classification. (Densely Connected Convolutional Networks)

ResNet50

ResNet50 is a deep convolutional neural network with 50 layers, structured to include residual connections that skip one or more layers, enhancing training by addressing the vanishing gradient problem, and is primarily used for image recognition tasks. (Deep Residual Learning for Image Recognition)

Exprimenting

We employ transfer learning with pre-trained models on the COVID-19 chest X-ray dataset. Through fine-tuning on preprocessing methods (bare normalization, segmentation and gamma correction), learning rate (range [1𝑒−3,1𝑒−6]), batch size (within 16, 32, 64, 128), and weight decay (range [1𝑒 − 1,1𝑒 − 6]), we obtain optimized settings (as reported in Table 1) and corresponding models. Additionally, the cross-entropy function served as the loss function, and AdamW/Adam are utilized as optimizers for experimented models. Then, observing the superior performance of ViT-B16 compared with others in the early stage of experiments, we further fine-tuned it by setting different numbers of encoder blocks inside. The configurations and results on different numbers of block settings are shown in Table 2.

Result Analyzing

Train and Validation Loss

To assess the training effectiveness of the experimented models, we monitor the loss function on both the training and validation datasets.

Convergence Rate: Under the optimized settings reported in Table 1, EfficientNet-B0, ViT-Base-patch8, ViTBase-patch16, and ViT-Base-patch32 achieve minimum loss within 10 iterations, demonstrating their superior convergence speed. MViTv2 and EfficientNet-B5 achieve minimum loss between 10 to 15 iterations, whereas EfficientViT-B3 requires nearly 30 iterations to reach its minimum

Overfitting: We employ the cross-entropy loss function and implement early stopping with a patience of 30 epochs to mitigate overfitting. As observed in the training and validation loss curves depicted in Fig. 3, the extent of overfitting varies across models. Notably, EfficientViT-B3 exhibits a stable fitting behavior, while EfficientNet models demonstrate a tendency towards relatively unstable overfitting on our experimental dataset.

Comparisons between Experimented Models and other Researchers' Work
To validate the performance of our model, we calculated the accuracy metrics and training efficiency to compare with other models.

Confusion Metrics and ROC Curves Analysis:

To evaluate the proposed model's accuracy and sensitivity, and explore potential bias in each specific category, we draw the confusion matrix and ROC curves (along with AUCs).

Predication Visualization (Grad-CAM):

Visualize models' ability of feature extraction on an image by using Grad-CAM. The class activation map (CAM) is a heat-map-like graph that highlights the critical features the model thinks. The red color part is the highlighted part.

Our Software: Animated2GradCAM -- Interactive Visualization

Observing the requirement for an intuitive analysis of the image classification model's predictions, we make this software by integrating the powerful Animint2 R package and the GradCAM technique.

Quantum-Resistant Encryption/Decryption for Secure Diagnoses

Our project is intended to handle sensitive medical data. For that reason, it is very important to have a method of safeguarding this data. However, with the technology landscape constantly developing, it is very important to take into account emerging quantum-computing technology. For most of the 2000s, technology has used the Ron Rivest, Adi Shamir, and Leonard Adleman (RSA) encryption method to safeguard data, which worked for a reasonable amount of time, utilizing large prime numbers more than 300 digits long as a public key. The two private keys needed to decrypt a given message are two smaller prime numbers that are multiplied together to equal the public key's value. However, with a quantum computer that can test multiple bit states at once, this method has become vulnerable to attacks. Therefore, quantum-resistant encryption methods are required to handle the medical data that our project learns from

Let's discuss how Kyber encryption works. Overall, Kyber is a type of encryption known as lattice-based encryption. In the RSA encryption scheme, every individual has 2 private keys and 1 public key. Specifically, the public key is a significantly large prime number, and the private keys are two large prime factors of the public key. Using the public keys, the data that needs to be sent is "jumbled" using the large public key. However, to decrypt this data, you need to have the factors of this prime number, which are the 2 private keys that the owner possesses. Without this key, it is true that the data can still be intercepted and decrypted without having the private keys, but this setup can take the best supercomputers months or even years to decrypt. Sadly, this was only the case until the formation of quantum computers. This encryption scheme is vulnerable to quantum computers because quantum computing bits can be in a state of superposition. To effectively understand this idea, it is important to take a step back. Let us think about the computers that most of us use on a daily basis. For these devices to make decisions, they possess possibly billions of switches that are either on (state 1) or off (state 0). These switches allow the computer to "think" and carry out processes ordered by the user. Given that these bits cannot be on and off simultaneously, for a problem that requires 2 bits to compute, it would need to test 00, 01, 10, 11 individually. As these include problems using large prime numbers, computers have to spend a significant amount of time testing every private key possibility. This idea explains RSA's effectiveness against average computers and supercomputers. However, quantum bits maintaining superposition explains that each quantum bit can be in an on and off state at the same time. Therefore, with only 32 quantum bits, 4,294,967,296 different possibilities can be tested simultaneously. With the amazing capabilities of quantum computers, can we even defend sensitive data from quantum computer interception?

Luckily, there is an encryption method that can effectively defend against quantum computers, known as lattice-based encryption. This is the framework of Kyber 512/768/1024, which is utilized by our system. In comparison to the RSA encryption method discussed earlier, lattice-based encryption takes a different approach by scrambling data and placing that data at a specific point in a lattice, adding noise to it so the actual placement of the data is close to, but not entirely attached to one point on the lattice. Each lattice needs a set of vectors to navigate it, which can range from 2-4 polynomials depending on the version of Kyber encryption being used. The private key can be seen as the vectors that are optimal for navigating an individual lattice. For the public key, an individual shares their lattice and a bad set of vectors to the data sender, allowing them to assign and "jumble" data to a certain point in the lattice and then send it back. This requires significantly more computational resources and time given that lattices can be in more than 100 dimensions depending on the program. Below is a set of figures with the top left representing an efficient set of vectors (private key) and the top left representing an inefficient set of vectors (public key). The figure below those figures represents the travel path to the data given the respective set of vectors. This goes to show the effectiveness of lattice-based encryption in the modern world.