COVDrugX-Drug Gene Expression for Covid-19 Drug Repurposing
Kamal Rawal#1, Prashant Singh1, Robin Sinha1, Priya Kumari1, Swarsat Kaushik Nath1, Ridhima1, Sukriti Sahai1, Sweety Dattatraya Shinde1, Nikita Garg1 , Preeti P.1, Trapti Sharma1
Amity Institute of Biotechnology, Amity University, Uttar Pradesh, India
#Corresponding Author
Email ID: kamal.rawal@gmail.com
Centre for Computational Biology and Bioinformatics, AIB
Amity University, Noida.
Keywords: Bioinformatics, drug repurposing, artificial intelligence, COVID-19, drug gene expression
Supplementary Data Website: https://tinyurl.com/Cov-drug-GE
COV-DRUGX Software Pipeline : http://drugx.kamalrawal.in/cov_gene_expression/
Abstract
SARS- CoV-2 Virus caused COVID-19 pandemic all over world and due to continuous change in its pathogenicity from individual to individual, a single drug won't be sufficient [Saxena et al., 2020]. lack of potent drugs against Covid-19, drew attention of researchers towards the age-old concept of Drug repurposing used in treatment . Gene expression is characterized by cellular and organism phenotype [Pham et al.,2021]. Drugs can control gene expression , protein formation and other biological functions . In our module, drugs and their interacting Covid 19 related gene, their regulatory expressions are studied and scored. Patterns of gene expression whether upregulated or downregulated are also studied. Here we present a web enabled tool which helps in ranking the COVID-19 drugs based upon underlying gene expressions. The users are allowed to give drugs in SMILE format and the tools will provide the list of relevant targets related to COVID-19.
1. Introduction
Over the past decades, to develop a de novo drug often takes billions of dollars and about 9–12 years(Ashburn & Thor, 2004). As we Know genetic signature is defined by associating a gene set with a specific pattern of expression (A. Califano, et al., 2000). Gene expression signatures have been effective in recovering ‘connections’ between genes, drugs and diseases involving (or involved in) the same biological process, by combining a large collection of gene expression data following drug treatment with a pattern-matching method(J. Lamb, et al., 2006). After the initial evaluation and identification of lead molecules, gene expression profiling and bioinformatics analysis would be particularly important to gain insights in gene expression patterns. By the method of genetic signature inversion people had repurposed an antiulcer drug and an antiepileptic drug for lung cancer and inflammatory bowel disease by comparing each of these disease signatures to each of the gene expression signatures for 164 drugs from CMap (Sirota et al., 2011 and Dudley et al.,2011). In turn, this knowledge can be utilized to improve drugs to accomplish desirable attributes such as disease free survival, eradication of disease, elimination or minimization of toxic side effects, reduction of undesirable biotransformation, improvement in distribution (bioavailability), overcoming of drug resistance, and improvement of immune responses. Therefore, rational drug design would be an integral approach to drug development and discovery (Mandal et al., 2009).
2. Implementation
We collected a complete dataset of DGIdb (www.dgidb.org) containing drug gene interaction data (Cotto et al., 2018) providing a clear picture about the drugs and its associated genes. The dataset contained 40,000 genes and 10,000 drugs involved in over 100,000 drug-gene interactions (Supplementary Table 1).
We collected the gene sets of the DEG’s (Differential Expression Genes)related to Covid-19 from GSEA (Blanco et al., 2020)and MSigDB. The data from GSEA contains the expression study of 6,943 genes and MSigDB database contains the expression study of 34,953 genes. A final data set has been created which consists of the genes with its gene expression (Supplementary Table 2).
3. Usage
The module provides information genes associated with the drug and expression information of these genes whether upregulated or downregulated. Gene associated with the query drug is searched in the database and the drugs that interact with genes associated with COVID-19 are assigned a score of 1. If the gene is not found in the database of DEGs, the query drug is assigned a score of 0 i.e. the query drug doesn’t interact with any gene associated with COVID-19.
4. Result and Discussion
As a case study, we have collected three drug datasets i.e, 1,000 FDA approved drugs, 261 positive drugs and 37 drugs from machine learning study (Suvarna et al., 2021). The FDA approved drugs were extracted from the DrugBank database (https://go.drugbank.com/) used to input for the server (Supplementary Table 3). The intermediate result file has been analysed and found that cisplatin, methyldopa, doxorubicin, metformin and fluorouracil were top five FDA approved drugs which is related with covid-19 DEGs (supplementary table 4). the distribution of the fda approved drugs revealed that 448 drugs were associated with covid-19 DEGs (supplementary figure 1)
The positive drugs dataset (261 drugs) collected from various literature. Those drugs were subjected to analysis with this tool and the intermediate file was obtained (supplementary table 5). Metformin, quercetin, ethanol, gemcitabine and daunorubicin were the top five drugs found after the analysis of the intermediate file (supplementary table 6). We have plotted the distribution of the total number of covid-19 interacted degs against the total number of drugs (supplementary figure 2). Total 105 drugs found associated with covid-19 DEGs
In another experiment, we have extracted 37 drugs from the study reported by Suvarna et al in the year 2021 (Suvarna et al., 2021). Suverna et al predicted 37 drugs as the prognostic markers for the COVID-19 using proteomics and machine learning approach. Those drugs were used as samples for our server and the predicted intermediate file is collected (Supplementary Table 7). The resultant file was analysed and top drugs were found including metformin, quercetin, daunorubicin, haloperidol and rapamycin which are related with COVID-19 interecting DEGs (Supplementary Table 8). Further, we have plotted the distribution of the drugs against COVID-19 DEGs (Supplementary Figure 3).
The intermediate file of results consists of 18 columns. The “DRUG” column represents the drug name, the “VALUE” column gives the module prediction (either 0 or 1) and the “IN_DATABASE_VALUE” column describes the availability of the drug (ranges from 1 to -1). The “NUMBER_OF_GENE_TARGETS” and “NUMBER_OF_COVID19_GENE_TARGETS” column gives the total number of gene targets and the total number of COVID-19 targets matched respectively. The “UP_REGULATION”, “DOWN_REGULATION” and “BOTH_REGULATION” provide the total number of genes that are upregulated or downregulated or both respectively. The “UP_SCORE_G”, “DOWN_SCORE_G” and “BOTH_SCORE_G” calculate the probability of the drug occurrence in the drug gene interaction dataset. However, the “UP_SCORE_C”, “DOWN_SCORE_C” and “BOTH_SCORE_C” calculate the occurrence of a drug in the COVID-19 drug gene interaction dataset. The column “UP_REGULATION_GENES”, “DOWN_REGULATION_GENES”, “BOTH_REGULATION_GENES” and “GENES” list the name of upregulated gene, downregulated genes, both up and down regulated genes and all set of genes respectively.
The formula to calculate the Score_G of UP/DOWN/BOTH :
The formula to calculate the Score_C of UP/DOWN/BOTH:
Metformin is found to be the most interacting drug with the covid DEG’s (Supplementary Figure 4). There have been studies that have some proof of metformin responsible for decreased blood glucose levels and increased insulin sensitivity. Hence, could possibly play a role in inhibiting viral infection, multiplication, and maturation, inhibit translation of viral proteins, regulate viral protein–host protein interactions, and modulate inflammation and the immune response in COVID-19 patients(Samuel et al., 2021).
5. Conclusion
Drug-Gene interaction is an important part of most of the rational drug repositioning approaches. In fact, different biochemical, physical, and mathematical techniques have been designed and optimized to accurately infer links between ligands and Genes or associated proteins from these genes. In this work we utilize the drug gene interaction database (DGIdb database) and Covid-19 DEGs for the prediction of off-target effects to suggest potential cases of drug repurposing and determine the molecular mechanism responsible for changes in gene- expression. There are a number of ways to find out the drug-off target drugs. Here we have combined the experimental data and system biology approach to yield a promising tool to better understand the biological response of the drug.
References: See Supplementary Website: https://sites.google.com/view/drugx-supplementary
Contribution of Authors
This study was conducted under the overall guidance of KR, who contributed in protocol, critical evaluation of data and manuscript. The pipeline was designed, constructed and validated by RS and PS. Manuscript writing was done by PS, PK, and PP. All the authors are responsible for the content of the manuscript.
Acknowledgement
We extend our sincere gratitude to Amity University for providing administrative and technical support required in the conduct of this study.
Financial Support and Sponsorship
Dr. Kamal Rawal acknowledges the support provided by SERB, Department of Science and Technology (Grant ID: CVD/2020/000842). The project involved usage of computational infrastructure (server etc) provided by the Department of Biotechnology (DBT), Ministry of Science and Technology Government of India (Grant ID: BT/PRI7252/BID/7/708/2016). SKN, PP, R, SS, SDS, NG, and TS have received financial support from grants obtained from Robert J. Kleberg Jr. and Helen C. Kleberg Foundation and Baylor College of Medicine, Houston, Texas, USA. We are also thankful to Amity University for the support provided during the conduct of this study.
5. References
Saxena, A. (2020). Drug targets for COVID-19 therapeutics: Ongoing global efforts. Journal of biosciences, 45(1), 1-24.
Pham, T. H., Qiu, Y., Zeng, J., Xie, L., & Zhang, P. (2021). A deep learning framework for high-throughput mechanism-driven phenotype compound screening and its application to COVID-19 drug repurposing. Nature Machine Intelligence, 3(3), 247-257.
Ashburn TT, Thor KB. (2004) Drug repositioning: identifying and developing new uses for existing drugs. Nat Rev Drug Discov 3: 673–683. pmid:15286734
Cotto, K. C., Wagner, A. H., Feng, Y. Y., Kiwala, S., Coffman, A. C., Spies, G., ... & Griffith, M. (2018). DGIdb 3.0: a redesign and expansion of the drug–gene interaction database. Nucleic acids research, 46(D1), D1068-D1073.
Sirota M, Dudley JT, Kim J, Chiang AP, Morgan AA, Sweet-Cordero A, et al. (2011) Discovery and preclinical validation of drug indications using compendia of public gene expression data. Sci Transl Med 3: 96ra77. pmid:21849665
Dudley JT, Sirota M, Shenoy M, Pai RK, Roedder S, Chiang AP, et al. (2011) Computational repositioning of the anticonvulsant topiramate for inflammatory bowel disease. Sci Transl Med 3: 96ra76. pmid:21849664
Califano, A., Stolovitzky, G., & Tu, Y. (2000, August). Analysis of gene expression microarrays for phenotype classification. In Ismb (Vol. 8, pp. 75-85).
Lamb, J., Crawford, E. D., Peck, D., Modell, J. W., Blat, I. C., Wrobel, M. J., ... & Golub, T. R. (2006). The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. science, 313(5795), 1929-1935.
Mandal, Soma, and Sanat K. Mandal. "Rational drug design." European journal of pharmacology 625.1-3 (2009): 90-100.
Samuel SM, Varghese E, Büsselberg D. Therapeutic potential of metformin in COVID-19: reasoning for its protective role. Trends in Microbiology. 2021 Mar 14.