Computational Biology

Identifying Cells Infected with SARS-CoV-2 Using Machine Learning

Scholars: Aaniqa Karmali, Mindy Yu, Xiongxiong Pei and Emma Wang

Objective

COVID-19 emerged in December 2019 and quickly spread across the globe, being declared a pandemic by the WHO on March 11th, 2020. It has since affected the lives of millions. One of the main challenges around COVID-19 is diagnosis. Some people remain asymptomatic, while others fall ill quickly, and symptoms vary from person to person, making diagnosis difficult. Our objective was to develop a prototype that could streamline the diagnosis process, saving precious time for millions of healthcare professionals. We built a classifer that can identify images of tissue containing active SARS-CoV-2, the virus that causes COVID-19.

The Process

1. Feature Extraction

Identifying the best image features to be used in the classifier using edge feature extraction.

2. Dimensionality Reduction

Reducing unnecessary complexity in the dataset while retaining important details, to increase model accuracy.

3. Classification Models

Using important reduced & extracted features to train classification models.

4. Results Assessment

Identifying the best model and latent features to classify cells as infected with COVID-19 or not.

All coding was done in Python using Jupyter Notebook.

The Dataset: RxRx19

This dataset was created as part of a study conducted by Recursion Pharmaceuticals, Utah to identify potential treatments for COVID-19. Researchers from Recursion Pharmaceuticals, Utah published a paper on the identification of potential treatments for COVID-19 through AI-enabled phenomic analysis of human cells infected with SARS-CoV-2.

The dataset consists of nearly 300,000 5-channel immunofluorescence images of Human Renal Epithelial cells (HRCE) and Vero cells. These cells belong to three disease classes: ‘Active SARS-CoV-2’, ‘UV-Inactivated SARS-CoV-2’, and ‘Mock.’ We built and trained our machine learning model to accurately classify both ‘UV-Inactivated SARS-CoV-2’ and ‘Mock’ cells as inactive, and classify ‘Active SARS-CoV-2’ cells as active.

Dataset Structure

Active vs. Inactive Cells in the Dataset

Live COVID-19 Update:

Page updated

Google Sites

Report abuse