COV-Drug-11 interaction server for Covid-19 Drug Repurposing
Kamal Rawal#1, Prashant Singh1, Robin Sinha1, Priya Kumari1, Swarsat Kaushik Nath1, Ridhima1, Sukriti Sahai1, Sweety Dattatraya Shinde1, Nikita Garg1 , Preeti P.1, Trapti Sharma1
Amity Institute of Biotechnology, Amity University, Uttar Pradesh, India
#Corresponding Author
Email ID: kamal.rawal@gmail.com
Centre for Computational Biology and Bioinformatics, AIB
Amity University, Noida.
Keywords: Bioinformatics, drug repurposing, artificial intelligence, COVID-19, SAR-CoV-2, Molecular Docking.
Supplementary Data Website: https://sites.google.com/view/drugx-supplementary
COV-DRUGX Software Pipeline : http://drugx.kamalrawal.in/drugfc/
Abstract:
De novo drug discovery procedure is time-consuming and expensive[Mahmoud D. B. et al., 2020].Drug repurposing research has greatly benefited by systematic adoption of computational strategies such as machine learning and deep learning spanning a large area of unique methodologies and approaches [Issa N T et al., 2021]. In biological experimental drug repositioning methods, it is hard to find new drug indications based on a large number of existing drugs due to low knowledge of biological mechanisms.By optimizing computational approaches into efficient drug repositioning pipeline, repurposed drugs can be found systematically, in a much more cost-effective way and shorter timeline[Moridi Mahroo et al., 2019]. For addressing the same, we have built an web enabled tool based on deep learning for predicting the scores of a drug based upon the biological properties.The users are allowed to give drugs in SMILE format and the tools will provide the scores for 11 different biological properties related to COVID-19
1. Introduction
Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction [le cun et al, 2015]. A neural network “layer” is typically composed of a parametric, non-linear transformation of an input. Stacking these transformations forms a statistical data structure capable of mapping high-dimensional inputs to outputs. This mapping is executed by optimizing the parameters.[Shorten et al, 2021]. It is a deep learning-based module that computes the 200 different chemoinformatics properties of the drug based on the SMILES of the drugs and predicts whether it is a Drug Candidate for the Covid -19 or not. It includes properties like mutagenicity, Lipinski Rule of Five, toxicity, and drug likeness of the drugs. We have used the positive and negative datasets for the training of the model. The positive datasets are those which are associated with COVID-19. Here, we have introduced advanced AI algorithms combined with network medicine for drug repurposing.[Yen Lee et al, 2021] The main purpose of drug repurposing is to exceed the therapeutic use of the available drugs for more medical scope.[Zhou et al, 2020]
Drug repurposing is the procedure of pursuing new therapeutic uses for available drugs. This process can reduce a large amount of time, money, and danger of the traditional drug discovery process [Li et al, 2015]. Drug repurposing for COVID-19 suggests improvements to the existing deep learning (DL) approach to identify and repurpose drugs to treat this complex disease.[Yen Lee et al, 2021] New generation of DL approaches that can establish a cause and effect relationship as to why the repurposed drugs are suitable for treating COVID-19. [Yen Lee et al, 2021] The effects of COVID-19 on a patient will change over time, so drugs and methods used for treating a patient need to be modified according to the condition of the illness. The models can certainly help to point us in the right direction by identifying repurposed drugs that can safely and effectively treat COVID-19 to avoid having to go through time-consuming and expensive drug development process; however, we still have to run experiments in the laboratory to eventually validate the effectiveness of the repurposed drugs.[Shorten et al, 2021] This module has the ability to predict side effects of the drug using biological information from structured big data [Yen Lee et al, 2021] It can predict, repurpose, discover and design new COVID-19 therapies using predictive models [Gupta et al, 2021]. Every biological property has its own significance whether the drug is a repurposable drug for covid-19. Significance of each biological property is shown in Supplementary Table 3.
2. METHODOLOGY
The Drugx pipeline (http://drugx.kamalrawal.in/drugfc) was developed using computational tools to evaluate the 11 biological properties of drugs. This module shortlists drugs that show the relevant characteristics (features) to qualify them as repurposable drugs from covid-19. [8]
2.1 DATASET
Positive and Negative Datasets are visualised by DL11 Module. We have extracted 261 drugs for a positive dataset from text mining. SMILES and SDF’s of every drug are collected by pubchem database (https://pubchem.ncbi.nlm.nih.gov/) or by drugbank database (https://go.drugbank.com/). Positive dataset with the name of the drugs, their smiles and drug bank ID from drugbank or pubchem is shown in Supplementary Sheet 1. Documentation to run DL11 is described in Supplementary Doc 1.
2.2 IMPLEMENTATION
The DRUG11 module was developed using a deep learning-based system module that computes the 11 different biological properties of the drugs. It is based on that it classifies given SMILES which are suitable for COVID-19 repurposable drugs. The DL11 module evaluates and shortlists properties that show the relevant properties (features) to qualify them as COVID19 vaccine candidates.(Supplementary Figure 1).The algorithm used for machine learning is Fully Connected Network (Multilayer Perceptron).
All four DL models have a 1x input layer consisting of nodes. The number of nodes is based on the number of properties in each model. Each model was built using a different number of hidden layers where each hidden layer consisted of a 1x Fully Connected Layer (FCL), 1x leaky ReLU activation layer, and 1x batch normalization layer. The output layer consists of a 1x FCL with two nodes representing two output classes namely vaccine candidate and non-vaccine candidate. We used softmax activation, and a constant bias initializer with a calculated value using log (number of positive/number of negative) to obtain minimal loss at the early stages of the training. Next, we also employed Adam optimizer with exponential learning rate decay and categorical cross-entropy loss function during the training.
3. USAGE
The properties used in DL11 include Synthetic Accessibility score, Cycle, Quantitative Estimation of Drug likeness , Rule of Five, MolecularWeight, ALogP,
, NumBDonors, NumHAcceptors, Mutagenicity, Mutagenicity prediction probability. (Supplementary Table 1).
Rule of Five developed for “Druggability” proposed by Lipinski [Benet, L. Z., et 2016] , which are molecular mass less than 500 dalton , logP(high lipophilicity) less than 5 , less than 5 hydrogen bond donors , less than 10 hydrogen bond acceptors , Molar refractivity should be between 40-130 . Molecular weight of drug also play an important role in absorption like increase in molecular weight decreases absorption of drug [Chae, S. Y., et al 2005]. Synthetic accessibility score ensure that compounds can be chosen for validation in dose-response in drug synthesis [Ertl, P et al.,2009] , [Gupta, R et al.,2021]. Cancer's drug cycle property explains once the drug or combination of drugs enter in the body ,allowing it to rest and recover until the next dose taken[].
The module takes the input of the drug candidate in SMILES format and SDF Files of every drug and predicts whether the candidate could be repurposed against COVID-19.This module accepts drug ID as a query in the datasets which is a very huge data. Next it moves towards the COVID-19 properties (Supplementary Table 1). If the module got the drug property in the COVID-19 dataset which was obtained from the dataset then the module predicts score 1 otherwise it will give 0. This means that the given drug is associated with the COVID-19 target. (Supplementary Figure 2).
4. RESULTS AND DISCUSSION
As a case study we have selected 261 drugs from the literature to be used as samples (See Supplementary Sheet 1). Those drugs were subjected to the pipeline. This module represents the distribution of the drugs securing score 1 and 0 (See Supplementary sheet 2).
5. CONCLUSION
We have computed 11 biological properties such as mutagenicity and drug likeness. (Supplementary Table 1). We computed metrics such as accuracy, precision, recall, loss, sensitivity, and specificity to evaluate the performance of the DL models (Table DR). (Supplementary Table 2) This module resulted in better performance evaluation metrics with an accuracy of 70%. Positive datasets were computed using DL11. The resultant value is zero i.e. this drug cannot be used for Covid-19 according to the DL11 module and if the value is 1 it can be used for Covid-19. Results shown in 0 and 1 format as shown in Supplementary Sheet 2. The module also creates the values for each Biological Property. These are the intermediate files created in the result folder. (Supplementary Sheet 3).
This study was conducted under the overall guidance of KR, who contributed in protocol, critical evaluation of data and manuscript. The pipeline was designed, constructed and validated by RS and PS. Manuscript writing was done by SKN, PP, and PK. All the authors are responsible for the content of the manuscript.
Funding
This work was supported by a grant provided by SERB, Department of Science and Technology, Government of India (File Number: CVD/2020/000842). We also acknowledge Robert J. Kleberg, Jr. and Helen C. Kleberg Foundation, USA, Texas Children’s Hospital and Baylor College of Medicine, Houston USA.
Financial Support and Sponsorship
Dr. Kamal Rawal acknowledges the support provided by SERB, Department of Science and Technology (Grant ID: CVD/2020/000842). The project involved usage of computational infrastructure (server etc) provided by the Department of Biotechnology (DBT), Ministry of Science and Technology Government of India (Grant ID: BT/PRI7252/BID/7/708/2016). SKN, PP, R, SS, SDS, NG, and TS have received financial support from grants obtained from Robert J. Kleberg Jr. and Helen C. Kleberg Foundation and Baylor College of Medicine, Houston, Texas, USA. We are also thankful to Amity University for the support provided during the conduct of this study.
Data availability
Supplementary Data:
Supplementary Sheet 1 (S1)
Supplementary Sheet 2 (S2)
Supplementary Sheet 3 (S3)
Supplementary Table 1 (T1)
Supplementary Table 2 (T2)
Supplementary Table 3 (T3)
Supplementary Figure 1 (F1)
Supplementary Figure 2 (F2)
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
REFERENCES
Mahmoud, D. B., Shitu, Z., & Mostafa, A. (2020). Drug repurposing of nitazoxanide: can it be an effective therapy for COVID-19?. Journal of Genetic Engineering and Biotechnology, 18(1), 1-10.
Issa, N. T., Stathias, V., Schürer, S., & Dakshanamurthy, S. (2021, January). Machine and deep learning approaches for cancer drug repurposing. In Seminars in cancer biology (Vol. 68, pp. 132-142). Academic Press.
Moridi, M., Ghadirinia, M., Sharifi-Zarchi, A., & Zare-Mirakabad, F. (2019). The assessment of efficient representation of drug features using deep learning for drug repositioning. BMC bioinformatics, 20(1), 1-11.
Shorten, Connor, Taghi M. Khoshgoftaar, and Borko Furht. "Deep Learning applications for COVID-19." Journal of Big Data 8.1 (2021): 1-54.
Lee, Chun Yen, and Yi-Ping Phoebe Chen. "New Insights Into Drug Repurposing for COVID-19 Using Deep Learning." IEEE Transactions on Neural Networks and Learning Systems (2021).
C. Y. Lee and Y. P. Chen, "Descriptive prediction of drug side-effects using a hybrid deep learning model", Int. J. Intell. Syst., vol. 36, no. 6, pp. 2491-2510, Jun. 2021.
Zhou, Yadi, et al. "Artificial intelligence in COVID-19 drug repurposing." The Lancet Digital Health (2020).
Li J, Zheng S, Chen B, Butte AJ, Swamidass SJ, Lu Z (2016) A survey of current trends in computational drug repositioning. Brief Bioinform 17(1):2–12
Smith T, Bushek J, LeClaire A, Prosser T (2020) COVID-19 drug therapy. Elsevier 1(1):1–25
Rawal, Kamal, et al. "Identification of vaccine targets in pathogens and design of a vaccine using computational approaches." Scientific reports 11.1 (2021): 1-25.
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." nature 521.7553 (2015): 436-444.
Benet, L. Z., Hosey, C. M., Ursu, O., & Oprea, T. I. (2016). BDDCS, the rule of 5 and druggability. Advanced drug delivery reviews, 101, 89-98.
Chae, S. Y., Jang, M. K., & Nah, J. W. (2005). Influence of molecular weight on oral absorption of water soluble chitosans. Journal of controlled release, 102(2), 383-394.
Ertl, P., & Schuffenhauer, A. (2009). Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of cheminformatics, 1(1), 1-11.
Gupta, R., Srivastava, D., Sahu, M., Tiwari, S., Ambasta, R. K., & Kumar, P. (2021). Artificial intelligence to deep learning: Machine intelligence approach for drug discovery. Molecular Diversity, 1-46.
Jia, C. Y., Li, J. Y., Hao, G. F., & Yang, G. F. (2020). A drug-likeness toolbox facilitates ADMET study in drug discovery. Drug Discovery Today, 25(1), 248-258.