COV-Drug Target interaction server for Covid-19 Drug Repurposing
Kamal Rawal#1, Prashant Singh1, Robin Sinha1, Priya Kumari1, Swarsat Kaushik Nath1, Ridhima1, Sukriti Sahai1, Sweety Dattatraya Shinde1, Nikita Garg1 , Preeti P.1, Trapti Sharma1
Amity Institute of Biotechnology, Amity University, Uttar Pradesh, India
#Corresponding Author
Email ID: kamal.rawal@gmail.com
Centre for Computational Biology and Bioinformatics, AIB
Amity University, Noida.
Keywords: Bioinformatics, drug repurposing, artificial intelligence, COVID-19, molecular targets
Supplementary Data Website: https://tinyurl.com/Dl200
COV-DRUGX Software Pipeline: http://drugx.kamalrawal.in/drugfc200/
Abstract
Deep learning has recently demonstrated its superior performance than classic methods to assist computational drug discovery [ ̈Ozt ̈urk et al., 2018; Lee et al., 2019], thanks to its expressive power in extracting, processing and extrapolating patterns in molecular data[Huang K et al., 2020]. Recently, various computational methods have been proposed to find new therapeutic applications of the existing drugs[Hooshmand A. S. et al., 2021]. Here, in our module we have applied a deep neural network to the problem of drug repurposing in Covid19. We study a total of 200 chemical properties grouped in different categories.We present a web enabled tool which helps in ranking the COVID-19 drugs based upon underlying chemical properties. The users are allowed to give drugs in SMILE format and the tools will provide the list of relevant targets related to COVID-19.
1. Introduction
Drug repurposing is the procedure of pursuing new therapeutic uses for available drugs. The process of drug repurposing reduces the amount of time, money, and danger of traditional drug discovery processes [Li et al., 2015]. Deep learning algorithms allow computational models that are a collection of multiple processing layers that help in learning representations of data with multiple levels of abstraction [leCun et al., 2015]. Deep Learning has battled the COVID-19 pandemic and provides directions for future research on COVID-19 [Shorten et al., 2021]. New generation of DL approaches establishes cause and effect relationship as to why the repurposed drugs are suitable for treating COVID-19. [Yen Lee et al., 2021]. As the effects of COVID-19 on a patient progress over time, so drugs and methods used for treating a patient need to be modified according to the condition of the illness. The models can certainly help to point us in the right direction by identifying repurposed drugs that can safely and effectively treat COVID-19 to avoid having to go through time-consuming and expensive drug development process; however, we still have to run experiments in the laboratory to eventually validate the effectiveness of the repurposed drugs.[Shorten et al., 2021].
In our study, we have built a deep learning-based module that computes the 200 different chemoinformatics properties of the drug based on the SMILES of the drugs and predicts whether it is a Drug Candidate for the Covid -19 or not. The chemoinformatics properties are distributed in different categories. These categories comprise of Molecular Operating Environment (MOE)-based surface descriptors (49), Partial charge and VSA/charge descriptors (18), Count and fragment based descriptors (101), Graph descriptors(19), Electrotopological state (e-state) and VSA/e-state descriptors, Descriptors commonly used to assess drug-likeness(24), LogP and VSA/LogP descriptors(13), Refractivity related descriptors(11) and General descriptors(12).The 200 properties including properties like molecule hydrogen donor and acceptor efficiency, logP value of the molecules etc, were calculated using RDkit. We have used the positive and negative datasets for the training of the model. The positive datasets are those which are associated with COVID-19. Here, we have introduced advanced AI algorithms combined with network medicine for drug repurposing.[Yen Lee et al., 2021] The main purpose of drug repurposing is to exceed the therapeutic use of the available drugs for more medical scope.[Zhou et al., 2020] The computation of drug ADME (absorption, distribution, metabolism, excretion) properties helps pharmaceutical companies to discard compounds that are not drug-like in the discovery phase [Tim et al., 2020]. Studying cheminformatics properties for drugs can help predict, repurpose, discover and design new COVID-19 therapies using predictive models [Gupta et al., 2021]. Every property has its own significance for deciding whether a drug is a repurposable drug for covid-19 or not. Significance of each property is shown in Supplementary Table 3.
2. IMPLEMENTATION
The DL module of the DrugX pipeline (http://drugx.kamalrawal.in/) was developed using computational tools to evaluate 200 chemoinformatics properties of drugs. The Drug-DL-200 is a deep learning based server which computes the chemoinformatics properties of drugs using their SMILE notations and predicts the repurposibility of the drugs against Covid-19. Implementation of the deep. This module shortlists drugs that show the relevant characteristics (features) to qualify them as repurposable drugs against covid-19 [Li et al., 2016]. We evaluated the parameters of chemical structures of molecules to shortlist therapeutically effective medicines for patients. We computed various ADME (adsorption, distribution, metabolism, and excretion) properties of these molecules from their molecular structure [Tian et al. ,2015]. The early detection of ADME properties through in silico approaches reduces the risks of failure during the clinical phases [Hay et al. ,2014]. In silico approaches make the screening of a large number of available molecules faster.
2.2 DATASET
Positive and negative Datasets are evaluated by the DL200 Module. We have extracted 261 drugs for a positive dataset from text mining. SMILES and SDF’s of every drug are collected by pubchem database (https://pubchem.ncbi.nlm.nih.gov/) or by DrugBank database (https://go.drugbank.com/). Positive dataset with the name of the drugs, their SMILES and DrugBank ID from DrugBank or PubChem is shown in Supplementary Sheet 1.
The DRUG-DL-200 module was developed using a deep learning-based system module that computes the 200 different chemoinformatics properties of the drugs. It is based on that it classifies given SMILES which are suitable for COVID-19 repurposable drugs. The DL200 module evaluates and shortlists properties that show the relevant properties (features) to qualify them as COVID19 drug candidates (Supplementary Figure 1). The algorithm used for machine learning is Fully Connected Network (Multilayer Perceptron).
All four DL models have a 1x input layer consisting of nodes. The number of nodes is based on the number of properties in each model. Each model was built using a different number of hidden layers where each hidden layer consisted of a 1x Fully Connected Layer (FCL), 1x leaky ReLU activation layer, and 1x batch normalization layer. The output layer consists of a 1x FCL with two nodes representing two output classes namely vaccine candidate and non-vaccine candidate. We used softmax activation and a constant bias initializer with a calculated value using log (number of positive/number of negative) to obtain minimal loss at the early stages of the training. Next, we also employed Adam optimizer with exponential learning rate decay and categorical cross-entropy loss function during the training
Dataset Preparation
We searched for articles on Google Scholar related to drug repurposing for COVID19. We searched the keyword “drug repurposing AND covid 19” on Google scholar and obtained 24,800 hits. We curated all the papers for drug molecules relevant to Covid19, and we extracted 393 drugs that had been repurposed or in the process (of being repurposed) for treating COVID 19. We collected this data along with evidence by performing deep curation (See S1.XLSX). We label this dataset as a positive dataset. Similarly, we also created different control or negative datasets (N1, N2 …etc) for the purpose of machine learning. To create a negative dataset, we selected those drug molecules which had dissimilarity with positive dataset members.
Model Building
The model is built using Python and Keras. Keras takes class weights in the form of a dictionary. We generate the cut-offs in the form of a dictionary using Keras. After that we concatenate the positive and negative data. Training and testing of data is followed by the removal of unwanted columns from the data. The training and validation data are standardized and normalised using the Standard scaler object of Scikit learn(Pedregosa et. al., 2011). We balanced the imbalance dataset. Building the model using the best parameters derived through hyperparameter tuning.
3. USAGE
The properties used in DL200 include Synthetic Accessibility score, Cycle, Quantitative Estimation of Drug likeness , Rule of Five, Molecular Weight, ALogP, PSA, NumBDonors, NumHAcceptors, Mutagenicity, Mutagenicity prediction probability. (Supplementary Table 1). The module takes the input of the drug candidate in SMILES format and SDF Files of every drug and predicts whether the candidate could be repurposed against COVID-19. This module accepts drug ID as a query in the datasets which is a very huge data. Next it moves towards the COVID-19 properties (Supplementary Table 1). If the module got the drug property in the COVID-19 dataset which was obtained from the dataset then the module predicts score 1 otherwise it will give 0. This means that the given drug is associated with the COVID-19 target. (Supplementary Figure 2). Documentation to run DL200 is described in Supplementary Doc 1.
The Drug DL 200 server takes the drug SMILES notation or Drug Name or DrugBank ID as input. If the user gives the Drug Name or DrugBank Id as input, it parses the drug SMILES from the internal database and fetches the query drug’s SMILE format, otherwise, if the user provides the SMILE then the Server calculates 200 cheminformatics properties for the query drug and validates its repurposability against Covid-19.
4. RESULTS AND DISCUSSION
As a case study we have selected 261 drugs from the literature to be used as samples (See Supplementary Sheet 1). Those drugs were subjected to the pipeline. This module represents the distribution of the drugs securing score 1 and 0 (See Supplementary sheet 2). Drugs with score 1 were 153 and with 0 score were 106. 153 drugs can be used for Covid19 and 106 drugs were rejected. All the 200 properties of these positive datasets can be seen in Supplementary Sheet 3.
We selected the molnupiravir drug and got 1 as a score in the DL200 module. That means we can use molnupiravir drug as COVID-19. (Supplementary Sheet 4) All the 200 properties of the molnupiravir drug are evaluated as shown in Supplementary Sheet 5. The DL200 module evaluates score 1 for 103 properties and gives score 0 for 97 for 200 properties (Supplementary Sheet 5).
5. CONCLUSION
We have computed 200 chemoinformatic properties such as mutagenicity and drug likeness. (Supplementary Table 1). We computed metrics such as molecule hydrogen donor and acceptor efficiency, logP value of the molecules etc, to evaluate the performance of the DL models (Table DR). (Supplementary Table 2) This module resulted in better performance evaluation metrics with an accuracy of 70%. Positive datasets were computed using DL200. The resultant value is zero i.e. this drug cannot be used for Covid-19 according to the DL200 module and if the value is 1 it can be used for Covid-19. Results shown in 0 and 1 format as shown in Supplementary Sheet 2. The module also creates the values for each chemoinformatic Property. These are the intermediate files created in the result folder.
This study was conducted under the overall guidance of KR, who contributed in protocol, critical evaluation of data and manuscript. The pipeline was designed, constructed and validated by RS and PS. Manuscript writing was done by SKN, PP, and PK. All the authors are responsible for the content of the manuscript.
Funding
This work was supported by a grant provided by SERB, Department of Science and Technology, Government of India (File Number: CVD/2020/000842). We also acknowledge Robert J. Kleberg, Jr. and Helen C. Kleberg Foundation, USA, Texas Children’s Hospital and Baylor College of Medicine, Houston USA.
Financial Support and Sponsorship
Dr. Kamal Rawal acknowledges the support provided by SERB, Department of Science and Technology (Grant ID: CVD/2020/000842). The project involved usage of computational infrastructure (server etc) provided by the Department of Biotechnology (DBT), Ministry of Science and Technology Government of India (Grant ID: BT/PRI7252/BID/7/708/2016). SKN, PP, R, SS, SDS, NG, and TS have received financial support from grants obtained from Robert J. Kleberg Jr. and Helen C. Kleberg Foundation and Baylor College of Medicine, Houston, Texas, USA. We are also thankful to Amity University for the support provided during the conduct of this study.
All raw data were obtained from open sources and have been cited and deposited in Datasets S1 and also available on our website. Software Pipeline : http://drugx.kamalrawal.in/
Supplementary Data:
Supplementary Sheet 1 (S1)
Supplementary Sheet 2 (S2)
Supplementary Sheet 3 (S3)
Supplementary Sheet 4 (S4)
Supplementary Sheet 5 (S5)
Supplementary Table 1 (T1)
Supplementary Table 2 (T2)
Supplementary Table 3 (T3)
Supplementary Table 4 (T4)
Supplementary Table 5 (T5)
Supplementary Table 6 (T6)
Supplementary Table 7 (T7)
Supplementary Table 8 (T8)
Supplementary Figure 1 (F1)
Supplementary Figure 2 (F2)
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
REFERENCES
̈Ozt ̈urk, H., ̈Ozg ̈ur, A., and Ozkirimli, E. Deepdta: deep drug–target binding affinity prediction. Bioinformatics, 34(17):i821–i829, 2018.
Lee, I., Keum, J., and Nam, H. Deepconv-dti: Prediction of drug-target interactions via deep learning with convolution on protein sequences. PLoS computational biology,15(6):e1007129, 2019.
Huang, K., Fu, T., Xiao, C., Glass, L., & Sun, J. (2020). Deeppurpose: a deep learning based drug repurposing toolkit. arXiv preprint arXiv:2004.08919.
Hooshmand, S. A., Zarei Ghobadi, M., Hooshmand, S. E., Azimzadeh Jamalkandi, S., Alavi, S. M., & Masoudi-Nejad, A. (2021). A multimodal deep learning-based drug repurposing approach for treatment of COVID-19. Molecular diversity, 25(3), 1717-1730
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." nature 521.7553 (2015): 436-444.
Shorten, Connor, Taghi M. Khoshgoftaar, and Borko Furht. "Deep Learning applications for COVID-19." Journal of Big Data 8.1 (2021): 1-54.
Lee, Chun Yen, and Yi-Ping Phoebe Chen. "New Insights Into Drug Repurposing for COVID-19 Using Deep Learning." IEEE Transactions on Neural Networks and Learning Systems (2021).
C. Y. Lee and Y. P. Chen, "Descriptive prediction of drug side-effects using a hybrid deep learning model", Int. J. Intell. Syst., vol. 36, no. 6, pp. 2491-2510, Jun. 2021.
R. Gupta, D. Srivastava, M. Sahu, S. Tiwari, R. K. Ambasta and P. Kumar, "Artificial intelligence to deep learning: Machine intelligence approach for drug discovery", Mol. Diversity, vol. 25, no. 3, pp. 1315-1360, Aug. 2021.
Zhou, Yadi, et al. "Artificial intelligence in COVID-19 drug repurposing." The Lancet Digital Health (2020).
Li J, Zheng S, Chen B, Butte AJ, Swamidass SJ, Lu Z (2016) A survey of current trends in computational drug repositioning. Brief Bioinform 17(1):2–12
Smith T, Bushek J, LeClaire A, Prosser T (2020) COVID-19 drug therapy. Elsevier 1(1):1–25
Rawal, Kamal, et al. "Identification of vaccine targets in pathogens and design of a vaccine using computational approaches." Scientific reports 11.1 (2021): 1-25.
Clark, David E., and Stephen D. Pickett. "Computational methods for the prediction of ‘drug-likeness’." Drug discovery today 5.2 (2000): 49-58.
Hay, M., Thomas, D. W., Craighead, J. L., Economides, C. & Rosenthal, J. Clinical development success rates for investigational drugs. Nature Biotechnol. 32, 40–51 (2014).
Tian, S. et al. The application of in silico drug-likeness predictions in pharmaceutical research. Adv Drug Deliv Rev 86, 2–10 (2015).
Ertl, Peter, Bernhard Rohde, and Paul Selzer. "Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties." Journal of medicinal chemistry 43.20 (2000): 3714-3717.
Labute, Paul. "A widely applicable set of descriptors." Journal of Molecular Graphics and Modelling 18.4-5 (2000): 464-477.
Wildman, Scott A., and Gordon M. Crippen. "Prediction of physicochemical parameters by atomic contributions." Journal of chemical information and computer sciences 39.5 (1999): 868-873.
Gasteiger, Johann, and Mario Marsili. "Iterative partial equalization of orbital electronegativity—a rapid access to atomic charges." Tetrahedron 36.22 (1980): 3219-3228.
Labute, Paul. "A widely applicable set of descriptors." Journal of Molecular Graphics and Modelling 18.4-5 (2000): 464-477.
Chakravarti, Suman K., and Sai Radha Mani Alla. "Descriptor free QSAR modeling using deep learning with long short-term memory neural networks." Frontiers in artificial intelligence 2 (2019): 17.
Salum, Lívia B., and Adriano D. Andricopulo. "Fragment-based QSAR: perspectives in drug design." Molecular diversity 13.3 (2009): 277-285.
Balaban, Alexandru T. "Highly discriminating distance-based topological index." Chemical physics letters 89.5 (1982): 399-404.
Bertz, Steven H. "Convergence, molecular complexity, and synthetic analysis." Journal of the American Chemical Society 104.21 (1982): 5801-5803.
Hall, Lowell H., Brian Mohney, and Lemont B. Kier. "The electrotopological state: structure information at the atomic level for molecular graphs." Journal of chemical information and computer sciences 31.1 (1991): 76-82.
Jia, Chen-Yang, et al. "A drug-likeness toolbox facilitates ADMET study in drug discovery." Drug Discovery Today 25.1 (2020): 248-258.
Chaube, Suryanaman, Sriram Goverapet Srinivasan, and Beena Rai. "Applied machine learning for predicting the lanthanide-ligand binding affinities." Scientific reports 10.1 (2020): 1-11.
Riniker, Sereina, and Gregory A. Landrum. "Similarity maps-a visualization strategy for molecular fingerprints and machine-learning methods." Journal of cheminformatics 5.1 (2013): 1-7.
Hui, David S., et al. "The continuing 2019-nCoV epidemic threat of novel coronaviruses to global health—The latest 2019 novel coronavirus outbreak in Wuhan, China." International journal of infectious diseases 91 (2020): 264-266
Car, Zlatan, et al. "Modeling the spread of COVID-19 infection using a multilayer perceptron." Computational and mathematical methods in medicine 2020 (2020).
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830.