Machine Learning approach to repurpose Azacitidine against Covid-19
Lakshmi Priya MK, Robin Sinha1, Preeti P.1, Trapti Sharma1, Kamal Rawal#1
1. Amity Institute of Biotechnology, Amity University Uttar Pradesh, India.
#Corresponding Author
Email ID: kamal.rawal@gmail.com
Centre for Computational Biology and Bioinformatics,
AIB Amity University, Noida
Abstract:
Background: Since the outbreak of the coronavirus in 2019, new strains of the virus have evolved and currently, the Omicron (1.1.529) strain has been prevalent and rapidly spreading globally, therefore scientists have been working on developing a novel therapeutic drug that is effective against COVID19. Drug repurposing has been chosen as an emergency alternative due to the lag in typical drug development operations. Its strategy can be broadly categorized into 1) network-based, 2) structure-based, 3) artificial intelligence approaches.
Methods: To evaluate drugs for their efficacy against COVID-19, we constructed a DrugX database encompassing 14 modules built using various methodologies based on physiochemical characteristics, target, molecular docking, gene expression, molecular data, and assay information, to mention a few. We used this database to study the characteristics of Azacitidine, as well as its interactions with viral and human proteins, in the context of COVID 19-induced changes in expression patterns.
Results and Conclusion: we found that the Azacitidine interacts with DNMT1 which in turn is predicted to be in association with ORF8 of SARS-CoV 2. The interaction of Azacitidine with the SARS-CoV-2 pathway is predicted along with, significant abnormal gene expression during COVID19.
Keywords: Computer-aided drug designing, drug repurposing, artificial intelligence, COVID-19, Azacitidine, Machine learning.
Introduction:
COVID-19, caused by the novel SARS-CoV-2, an enveloped, positive-sense, single-stranded RNA betacoronavirus with a length of 27–32 kb and distinctive crown-shaped glycoprotein spikes was initially found in Wuhan, China in December 2019 [Wang C. et al., 2020], and has since evolved into a global phenomenon, posing a global public health hazard. More than 313 million cases and 5.5 million deaths had been documented worldwide as of January 12, 2022.[Johns Hopkins University, n.d].The SARS CoV-2 belongs to the kingdom Riboviria, order Nidovirales, suborder Cornidovirineae, family Coronaviridae, subfamily Orthocoronavirinae, genus Betacoronavirus (lineage B) [Chan JF. et al., 2020], subgenus Sarbecovirus, and the species Severe acute respiratory syndrome-related coronavirus [Schoch CL. et al., 2020]. Its overall genome sequence showed 79.5% similarity with SARS-CoV which caused an outbreak of SARS in 2002–03 in Guangdong, China [Benvenuto D. et al., 2020]. The SARS-CoV-2 genome encodes four structural proteins, two polyproteins, and six accessory proteins. Spike (S), envelope (E), membrane (M), and nucleocapsid (N) proteins are structural proteins, while the polyproteins ORF 1a and ORF 1b proteolytically cleave to generate 11 and 5 nonstructural proteins, respectively. ORF3, ORF6, ORF7a, ORF7b, ORF8, ORF9b, and ORF10 comprise the accessory protein (Figure 1) [Brant AC. et al., 2021].
There is an urgent need for a drug candidate that can treat a novel disease like COVID-19. Due to its ability to proofread and delete mismatched nucleotides during genome replication and transcription, CoV has long impeded therapeutic development, emphasizing the need for new CoV inhibitory targets. Because the safety and pharmacological profile of the repurposed drug have already been established, drug repurposing is a valuable strategy for exploring new medical indications for existing approved drugs because it carries a lower risk of development and takes less time, allowing for the faster discovery of new therapeutic options. Several anticancer drugs have the potential to enhance COVID-19 results by processes that are similar to those used in cancer treatment, such as reducing inflammation, inhibiting cell division, and regulating the host–tumor microenvironment [Borcherding N. et al., 2020].
Cytotoxic T lymphocytes are critical in the elimination of virally infected and tumor cells, and they rely on HLA class I expression on these cells to direct their attack [Townsend A. et al., 1989], which is known to be regulated by the DNA methyltransferase (DNMT1) [Luo N. et al., 2018]. DNMT1 typically adds a methyl group to the base of a DNA molecule, which is known as DNA methylation and is the most important epigenetic marker for modulating gene expression [Schübeler D. 2015]. DNA methylation abnormalities are significant in many stages of cancer growth and penetration by intrusive DNA and RNA pathogens [Wolffe AP. et al., 1999].
Azacitidine (Vidaza®), is a pyrimidine nucleoside analog of cytidine. Azacitidine can be integrated into RNA or DNA after being phosphorylated. When azacitidine is integrated into tRNA, it inhibits tRNA methyltransferases and interferes with tRNA methylation and processing, resulting in protein synthesis suppression. In terms of DNA, azacitidine inhibits DNA methyltransferase irreversibly, which might cause cell differentiation and/or apoptosis. Due to the irreversible binding of DNA methyltransferase to the azacitidine adduct in DNA, azacitidine inhibits DNA methylation after incorporation [Issa JP. et al., 2005](Figure 2).
Previously, we have developed several machine learning and bioinformatics platforms. These include text mining and network biology-based systems [Jagannadham et al., 2016], vaccine discovery systems [Rawal et al. 2021], next-generation sequencing analysis systems for cancer and other genomes [Preeti et al.,2021, Rawal et al 2011, Mandal et al 2006].
Here we have built the CoV-Drugx pipeline (http://drugx.kamalrawal.in/drugx/), a drug repurposing tool, based on machine learning and deep learning algorithms.
2. Implementation:
Azacitidine was analyzed through a sequence of modules in the CoV-DrugX pipeline, a drug repurposing tool developed exclusively for COVID19. It is comprised of several modules that are based on various computational drug-repositioning strategies that use a variety of inputs from various databases and literature-based text mining algorithms. Using various strategies and source data, 14 modules named drug_circuit, drug_target, drug_dock_human, drug_dock_viral, drug_dock_KG, drug_phenotype, drug_AI_ranking, drug_condition, drug_side_effect, drug side effect neighbors, drug_gene expression, drug_dl 11, drug dl 200, and drug_gene network that were developed was used in the analysis of repurposing of Azacitidine against COVID-19. These modules return results to the query drug in the form of scores 0 and 1, utilizing input in the SMILES format.
The Drug dl 11 module, which was developed using a "deep learning approach," used a dataset of 262 drugs related to COVID-19 that was obtained through a literature search and studied 11 biological properties related to COVID-19 for training, including mutagenicity, drug-likeness, PSA, Hydrogen donor, and acceptor. The Drug dl 200 module examines 200 chemoinformatics properties under six types of descriptors, including surface, partial charge and VSA/charge, count and fragment-based, graph, Electrotopological state (e-state) and VSA/e-state, and drug-likeness retrieved from RDkit (https://rdkit.org/).
The data source for the Drug side effect module was created using the Sider(http://sideeffects.embl.de/) and OFFSIDES(http://tatonettilab.org/offsides/) databases. We've assembled the 6,123 drug side effects associated with COVID-19 in the drug side effect module. In addition, we've compiled a list of 3,052 drugs' varied adverse effects. This module functions in such a way that it predicts if a drug is related to COVID-19 or not based on drug-side effect associations. If the module detects a side effect in the COVID-19 dataset, it predicts a score of 1; otherwise, it predicts a score of 0.
The Drug_target framework relies on the TTD database, which contains 31,359 pharmaceuticals and their targets (http://db.idrblab.net/ttd/). In addition, based on a literature search, 378 targets relevant to covid 19 are compiled in a separate file. The targets were obtained utilizing medicines that have a functional role in COVID-19 treatment (positive drug dataset). The module's operation is such that if the query medication is linked to COVID-19, it will offer a target for it. The module predicts a score of 1 if the target name was found in the COVID-19 dataset, which was acquired from the TTD dataset, else it predicts a score of 0.
The DGIdb (https://www.dgidb.org/downloads) database is the source of the Drug circuit module, from which we gathered information on 100,274 genes and their corresponding drugs. In addition, we have 299 circuits and their associated gene-proteins derived from the research article (Loucera et al., 2020). This module's function is to offer a circuit for an input query drug-related with COVID-19. Users can submit either the drug name (separated by a pipe in a text file) or the drug's SMILE notation (separated by newline character in a text file). The module would accept drug names as a query and search the datasets provided for relevant interactions (genes) and further associated circuit information. The module selects the gene associated with the given drug and performs additional checks on the gene's associated circuit and functional information, such as Host-virus interaction, Immune activity, Antiviral defense, Endocytosis, Replication, and Energetics. If our query medication in the dataset has an associated circuit, the module predicts a score of 1; otherwise, it predicts a score of 0.
WebMD (https://www.kaggle.com/rohanharode07/webmd-drug-reviews-dataset) is the source of the Drug phenotype module's data. The dataset includes a list of 6,147 genes, their associated drugs, and the phenotypes observed when the drug was prescribed. A total of 2,009 phenotypes shown in COVID-19 were also extracted via a literature search. This module operates in such a way that it returns the phenotype of the input drug if it is linked to COVID-19, as well as the number of drug phenotypes that match COVID-19.
Data is derived from multiple sources in the Drug Gene Expression module, including DGIdb, the drug central database (where the list of drugs and their interacting genes is extracted from the DGIdb database), and the drug central database (where drug_target interaction data is extracted from the drug central database). Only gene and drug names from two source files, DGIdb and Drug central database, are included in the processed file. A COVID-19 Gene Expression.tsv file comprising data from Blanco et al., 2020-MsigDb (https://www.gsea-msigdb.org/gsea/msigdb/), and mayanlab (https://maayanlab.cloud/covid19/) was also included. The COVID-19 Gene Expression processed file provides information on the gene symbol, as well as gene expression information, such as whether the gene is upregulated or downregulated by the drug. The module would include data on genes associated with the drug, along with their expression levels, whether up or downregulated. The database is searched for genes connected to the query drug, and drugs that interact with COVID-19 genes are given one score. The query medicine is given a score of 0 if the gene isn't found in the DEG database, indicating that it doesn't interact with any COVID-19-related genes.
Docking-based drug repositioning techniques are used in the drug dock human, drug dock viral, and drug dock KG modules. In drug dock viral, we have included 23 viral proteins from SARS-COV-2 for docking. These include spike protein, membrane protein, envelop small membrane protein, nucleocapsid protein, major protease, papain-like protease, nsp3 (207–379), RNA dependent RNA polymerase (RdRp, nsp12/7/8 complex), helicase, Nsp14, Nsp15 (endoribonuclease), Nsp10, Nsp16, Nsp 16/10 (2′-O-methyltransferase). Because they have been established as significant targets for viral entry, we also added ACE2 and TMPRSS2 from humans for docking against the query drugs in drug dock_human (Lan J. et al., 2020; Fraser J. Bryan. et al., 2021). Similarly, we have included human protein-interacting protein kinases AAK1, GAK, and JAK1/2 in drug dock KG, which has been shown to play a role in viral endocytosis.
Usage:
The Azacitidine's SMILES (NC1=NC(=O)N(C=N1)[C@@H]1O[C@H](CO)[C@@H](O)[C@H](O)[C@H]1O) retrieved from DrugBank was used to predict if it can be repurposed against COVID-19 (Figure 3). The Job Result is divided into three sections: 1) Query: This section offers job-related information. 2) Results: This part includes the module scoring values as well as the overall SI and PI score. SI is determined as the sum of all categorical values from the modules, and Pi score is calculated as the SI score divided by the total number of tools that generated valid findings. 3) Download: This section summarizes the results of each module's results in TSV format for the user to download.
Results
The CoV-DrugX Pipeline was designed in such a way that it can be used as an integrated computational drug repurposing tool for COVID-19. It consists of various modules which include predicting the repurposability of the drug based on their 11 biological properties and 200 chemoinformatics properties of SARS COVID-19 virus, analyzing the drugs based on conditions related to COVID19, phenotypes observed in COVID19, drugs showing symptoms similar to COVID19, side effects of drugs related to cure and prevent COVID19 and target, gene expression-based approach as well. Three docking-based approaches have also been implemented into the tool namely human target-based docking module, viral proteins target-based module, and protein kinases associated with human proteins as targets for the query drugs. The tool also contains a module to calculate the euclidean distance between drug and COV-2 disease and a module checking the query drug if it has any association with the SARS COV-2 circuit.
The deep learning modules Drug DL 11 and Drug DL 200 were used to calculate the biological and chemoinformatics properties of azacitidine (Table 3,4), and it was predicted that it follows Lipinski's Rule of five and has low mutagenicity. The drug does not treat any disorders similar to COVID-19 (Table 5), nor does it have any symptoms (Table 6) that are similar to COVID-19 symptoms, however, there are 28 common phenotypes with 1952 COVID-19 phenotypic traits (Table 7), as well as 297 out of 1838 side effects of the drug were shared, accounting for 16.6 percent of the SE of COVID-19 candidate drugs (Table 8). Azacitidine has revealed 18 genes that are UP/DOWN regulated, 5 genes that are DOWN regulated, and 7 genes that are UP regulated that are in relation to COVID-19 (Table 9) (Figure 3). It also involves eight pathways, two of which (TP53 and MYC) govern virus infection and replication in the host (Table 10). The Euclidean distance between the best model of the drug and COVID-19 was calculated to 2.89457 (Table 11). Azacitidine has been reported to bind with DNA methyltransferase 1 (DNMT1) (Table 12), which is predicted to interact with the SARS COVID-19 ORF8 protein [O'Meara MJ. et al., 2020], affecting antigen presentation and decreasing cytotoxic T Lymphocyte (CTL) detection and clearance of virus-infected cells [Zhang Y. et al., 2020](Figure 4).
When Azacitidine was docked against 23 viral proteins, the average binding affinity was -5.7 Kcal/mol. Among them, Nsp14 has the highest binding energy of -7.4 Kcal/mol, and Nsp1scored the lowest binding energy of -2.9 KCal/mol (Table 13) (Figure 5). The Ser178, Asp179, Val183, and Asn408 residues of Nsp14 viral protein interact with the Azacitidine (Figure 6). Correspondingly when the drug was docked against human proteins (ACE2, TMPRSS2) and their associated protein kinases (AAK1, JAK1/2, and GAK), the average binding energy was calculated as -6.34 KCal/mol. Among these proteins and protein kinases, GAK was found to have the highest binding energy of -7.3 KCal/mol (Table 14) (Figure 7).
Compiled scores for all the 14 modules are reported in (Table 15) where we get the final score in form of 0 and 1 for all the modules except docking modules which gives the score within the range of 0 and 1, considering the cumulative score for the docked proteins considered in the docking modules. The overall percentage of the score is 50%, recognizing the drug to be considered as a COVID-19 repurposable drug.
Conclusion:
Azacitidine has been found to interact with the DNA methyltransferase 1 (DNMT1) protein, which has been linked to the SARS COVID-19 ORF8 protein. The ORF8 viral protein has been associated with antigen presentation as well as decreased CTL identification and clearance of virus-infected cells. It is also projected that the drug has a binding affinity for human proteins, their related protein kinases, and viral proteins and that Azacitidine interacts with the SARS-CoV2 pathway. Significant aberrant gene expression was detected during COVID19. Hence Azacitidine can be considered as a potential drug target for COVID19.
Supplementary material
Supplementary Tables
Acknowledgment
Dr. Kamal Rawal acknowledges the support provided by SERB, Department of Science and Technology (Grant ID: CVD/2020/000842). The project involved the usage of computational infrastructure (server etc) provided by the Department of Biotechnology (DBT), Ministry of Science and Technology Government of India (Grant ID: BT/PRI7252/BID/7/708/2016) and Robert J. Kleberg Jr. and Helen C. Kleberg Foundation and Baylor College of Medicine, Houston, Texas, USA. We are also thankful to Amity University for the support provided during the conduct of this study.
References
1. Wang C, Horby PW, Hayden FG, Gao GF. A novel coronavirus outbreak of global health concern. The lancet. 2020 Feb 15;395(10223):470-3.
2. COVID-19 Dashboard by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU). ArcGIS. Johns Hopkins University.
https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6
3. Chan JF, Kok KH, Zhu Z, Chu H, To KK, Yuan S, Yuen KY. Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan. Emerging microbes & infections. 2020 Jan 1;9(1):221-36.
4. Schoch CL, Ciufo S, Domrachev M, Hotton CL, Kannan S, Khovanskaya R, Leipe D, Mcveigh R, O’Neill K, Robbertse B, Sharma S. NCBI Taxonomy: a comprehensive update on curation, resources, and tools. Database. 2020 Jan 1;2020.
5. Benvenuto D, Giovanetti M, Ciccozzi A, Spoto S, Angeletti S, Ciccozzi M. The 2019‐new coronavirus epidemic: evidence for virus evolution. Journal of medical virology. 2020 Apr;92(4):455-9.
6. Brant AC, Tian W, Majerciak V, Yang W, Zheng ZM. SARS-CoV-2: from its discovery to genome structure, transcription, and replication. Cell & Bioscience. 2021 Dec;11(1):1-7.
7. Borcherding N, Jethava Y, Vikas P. Repurposing anti-cancer drugs for COVID-19 treatment. Drug Design, Development and Therapy. 2020;14:5045.
8. Townsend A, Bodmer H. Antigen recognition by class I-restricted T lymphocytes. Annual review of immunology. 1989 Apr;7(1):601-24.
9. Luo N, Nixon MJ, Gonzalez-Ericsson PI, Sanchez V, Opalenik SR, Li H, Zahnow CA, Nickels ML, Liu F, Tantawy MN, Sanders ME. DNA methyltransferase inhibition upregulates MHC-I to potentiate cytotoxic T lymphocyte responses in breast cancer. Nature communications. 2018 Jan 16;9(1):1-1.
10. Schübeler D. Function and information content of DNA methylation. Nature. 2015 Jan;517(7534):321-6.
11. Wolffe AP, Matzke MA. Epigenetics: regulation through repression. science. 1999 Oct 15;286(5439):481-6.
12. Zhang Y, Zhang J, Chen Y, Luo B, Yuan Y, Huang F, Yang T, Yu F, Liu J, Liu B, Song Z. The ORF8 protein of SARS-CoV-2 mediates immune evasion through potently downregulating MHC-I. BioRxiv. 2020 Jan 1.
13. Issa JP, Kantarjian HM, Kirkpatrick P. Azacitidine. Nature reviews. Drug discovery. 2005 Apr 1;4(4):275.
14. Jagannadham, J., Jaiswal, H.K., Agrawal, S., Rawal, K., Comprehensive map of molecules implicated in obesity", PLoS ONE, vol. 11, no. 2 : e0146759. doi:10.1371/journal.pone.0146759, 2016.
15. Rawal, K. and Ramaswamy, R., "Genome wide analysis of mobile genetic elements insertion sites. Nucl. Acids Res., vol. 39, no. 16, pp. 6864-6878, Sep. 2011. Impact Factor 11.3.
16. Mandal, P., Rawal, K., Ramaswamy, R., Bhattacharya, A. and Bhattacharya, S. "Identification of Insertion hot spots for non-LTR retrotransposons: Computational and Biochemical application to Entamoeba histolytica." Nucl. Acids Res., vol. 34, no. 20, pp. 5752-5763, 2006. (Lead author and equal contribution). Impact Factor 11.3.
17. O’Meara MJ, Guo JZ, Swaney DL, Tummino TA, Hüttenhain R. A SARS-CoV-2-human protein-protein interaction map reveals drug targets and potential drug-repurposing. BioRxiv. 2020.