COV-Artificial Intelligence server for Covid-19 Drug Repurposing
Kamal Rawal#1, Prashant Singh1, Robin Sinha1, Priya Kumari1, Swarsat Kaushik Nath1, Sukriti Sahai1, Ridhima1, Sweety Dattatraya Shinde1, Nikita Garg1 , Preeti P.1, Trapti Sharma1
Amity Institute of Biotechnology, Amity University, Uttar Pradesh, India
#Corresponding Author
Email ID: kamal.rawal@gmail.com
Centre for Computational Biology and Bioinformatics, AIB
Amity University, Noida.
Keywords: Bioinformatics, drug repurposing, artificial intelligence, COVID-19, molecular targets
Supplementary Data Website: https://tinyurl.com/CoV-AI
COV-DRUGX Software Pipeline : http://drugx.kamalrawal.in/drugx/
Abstract
1. Introduction
Over the past few years, the field of artificial intelligence (AI) has moved from largely theoretical studies to real-world applications (Vamathevan et al., 2019). AI is currently used for four main ways which are the assessment of the severity of disease even prior to its administration, to prevent or solve complications during treatment, using of assistive technology during treatment procedures or operations on patients and to determine the reasons behind the use of particular instruments or chemicals during treatment, and to develop or extrapolate new uses for instruments or chemicals to improve safety and efficacy (Becker, 2019). Broadly academic research labs, biotechnology corporations, and technology companies have been exploring the use of AI and ML in three key areas ie., to predict pharmaceutical properties of molecular compounds and targets for drug discovery, using pattern recognition and segmentation techniques on medical images and to develop deep-learning techniques on multimodal data sources (Shah et al., 2019).
The general categories of AI/ML-based drug repurposing methods for therapeutics of COVID-19, including network-based algorithms, expression-based algorithms and integrated docking simulation algorithms. Artificial intelligence (AI) technology is hereby applied to identify the marketed drugs with potential for treating COVID-19 (Ke et al., 2020). For COVID-19 vaccine development by using a variety of techniques of AI such as artificial neural network, gradient boosting decision tree and deep neural network has been utilized (Wang et al., 2020; Wong, 2019).
Gysi et al applied three competing network-repurposing methodologies i.e, AI- based algorithm, Diffusion based algorithm and proximity based algorithm to develop a network which can be used for the drug repurposing (Gysi et al., 2021). They have designed a graph neural network (GNN) taking the idea from previously developed GNN (Zitnik et al., 2018). The multimodal graph is a heterogeneous graph i.e., G = (N, E). The “N” represents the three biomedical entities drugs, proteins and diseases whereas, “R” describes the relation between four interactions i.e., protein-protein interactions, drug-target associations, disease-protein associations, and drug-disease indications. The GNN model was optimized and used with four different parameters by changing the number of components, number of neighbours and minimum distance value between the nodes.
Further, they have calculated the cosine distance of each node (drugs associated with diseases) from the network medicine framework. The cosine distance was calculated using the UMAP algorithm. uniform manifold approximation and projection (UMAP). UMAP is a novel manifold learning technique based on nonlinear dimensionality reduction method. It utilizes three assumptions: the data is uniformly distributed on the Riemannian manifold, the Riemannian metric is locally constant, and the manifold is locally connected (Hozumi et al., 2021). This process produced embeddings for drug and disease nodes in the graph that were predictive of therapeutic indications, and we used the embeddings to construct ranked lists of candidate drugs for COVID-19.
2. Implementation
We have extracted the data of euclidean distance of each node (drugs associated with diseases) from the network medicine framework data (Gysi et al., 2021) which was calculated using the UMAP algorithm. There were four files of euclidean distance of nodes obtained from four different parameters explained below. We have considered the four parameters as four different models and sorted them which predict the rank of the drugs.(Supplementary Table 1-4).
Table 1: The different models of euclidean distance using different parameters.
Model name
Nearest neighbour nodes
Minimum distance between nodes
Model 1
10
0.25
Model 2
10
0.8
Model 3
05
0.5
Model 4
10
01
3. Usage
The module provides the target for a given drug name that is associated with COVID-19. Users are allowed to provide either drug names (separated by pipe in a text file) or SMILE notation of the drugs (separated by newline character in a text file).
The module accepts drug names as a query and it searches its target in the TTD datasets. The module picks the target of the provided drug and moves towards the COVID-19 target datasets. If the module got the target name in the COVID-19 dataset which was obtained from the TTD dataset then the module predicts a score of 1 otherwise it will give 0.
4. Result and Discussion
As a case study, we have collected three drug datasets i.e, 1,000 FDA approved drugs, 261 positive drugs and 37 drugs from machine learning study (Suvarna et al., 2021). The FDA approved drugs were extracted from the DrugBank database (https://go.drugbank.com/) used to input for the server (Supplementary Table 5). The intermediate result file has been analysed and found that metronidazole, exemestane, busulfan, dutasteride and econazole were top five FDA approved drugs with minimum average value of euclidean distance (Supplementary Table 6).
The positive drugs dataset (261 drugs) collected from various literature. Those drugs were subjected to analysis with this tool and the intermediate file was obtained (Supplementary Table 7). Pazopanib, artesunate, itraconazole, dutasteride and fingolimod were the top five drugs with minimum euclidean distance found after the analysis of the intermediate file (Supplementary Table 8).
In another experiment, we have extracted 37 drugs from the study reported by Suvarna et al in the year 2021 (Suvarna et al., 2021). Suverna et al predicted 37 drugs as the prognostic markers for the COVID-19 using proteomics and machine learning approach. Those drugs were used as samples for our server and the predicted intermediate file is collected (Supplementary Table 9). The resultant file was analysed and top drugs were found including daunorubicin, chloroquine, rapamycin, metformin and dabrafenib which have minimum average value (supplementary table 10).
The intermediate file of results consists of 7 columns. The “ID” column represents the serial number of the drugs starting from 0, “DRUG” column represents the drug name, and the “IN_DATABASE_VALUE” column describes the availability of the drug. The “MODEL” column gives the model serial number, the “RANK” column gives the rank of the drug in the particular model. The “DISTANCE” column fetch the euclidean distance from that particular model of given drug and the “VALUE” column predict the average value of obtained distance from all four model
The formula to calculate value is as follows:
DOXORUBICIN
DAUNORUBICIN
CISPLATIN
CARBOPLATIN
OXALIPLATIN
MEGESTROL ACETATE
EXEMESTANE
IMIQUIMOD
BETAMETHASONE
FLUTICASONE PROPIONATE
DAUNORUBICIN
ECONAZOLE
TACROLIMUS
ITRACONAZOLE
DOXORUBICIN
EXEMESTANE
HYDROXOCOBALAMIN
MEGESTROL ACETATE
GOSERELIN
PROGESTERONE
5. Conclusion
The euclidean distance between COVID-19 and all drugs in the decoded embedding space describes the closeness of the drug with COVID-19. Minimum the euclidean distance, more the closeness of drugs with COVID-19. The study concluded that metronidazole (FDA approved), itraconazole (positive dataset) and chloroquine (ML drugs) were found close to the COVID-19 disease.
6. References
Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, Li B, Madabhushi A, Shah P, Spitzer M, Zhao S. Applications of machine learning in drug discovery and development. Nature Reviews Drug Discovery. 2019 Jun;18(6):463-77.
Gysi DM, Do Valle Í, Zitnik M, Ameli A, Gan X, Varol O, Ghiassian SD, Patten JJ, Davey RA, Loscalzo J, Barabási AL. Network medicine framework for identifying drug-repurposing opportunities for COVID-19. Proceedings of the National Academy of Sciences. 2021 May 11;118(19).
Hozumi Y, Wang R, Yin C, Wei GW. UMAP-assisted K-means clustering of large-scale SARS-CoV-2 mutation datasets. Computers in biology and medicine. 2021 Apr 1;131:104264.
Ke YY, Peng TT, Yeh TK, Huang WZ, Chang SE, Wu SH, Hung HC, Hsu TA, Lee SJ, Song JS, Lin WH. Artificial intelligence approach fighting COVID-19 with repurposing drugs. Biomedical journal. 2020 Aug 1;43(4):355-62.
Becker A. Artificial intelligence in medicine: What is it doing for us today?. Health Policy and Technology. 2019 Jun 1;8(2):198-205.
Shah P, Kendall F, Khozin S, Goosen R, Hu J, Laramie J, Ringel M, Schork N. Artificial intelligence and machine learning in clinical development: a translational perspective. NPJ digital medicine. 2019 Jul 26;2(1):1-5.
Wang J, Wang H, Wang X, Chang H. Predicting drug-target interactions via FM-DNN learning. Current Bioinformatics. 2020 Jan 1;15(1):68-76.
Wong KK. Optimization in the Design of Natural Structures, Biomaterials, Bioinformatics and Biometric Techniques for Solving Physiological Needs and Ultimate Performance of Bio-devices. Current Bioinformatics. 2019 Jul 1;14(5):374-5.
Zitnik M, Agrawal M, Leskovec J. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics. 2018 Jul 1;34(13):i457-66.