We have used biological and physicochemical features for each of the protein sequences. Each protein sequence (from the training datasets) was evaluated for a total of 19 biological and 9,154 physicochemical features. The biological features were annotated using various bioinformatics tools and the physicochemical features were annotated using the ProtR package.
Cut-off value of each biological property used in Vax-ELAN
Result of bioinformatics tools used to evaluate the properties of 19 biological features for 574 bacterial proteins.
List of 11 biological properties/features selected after filtering for the bacterial dataset. The 20 biological features considered initially were filtered out using Welch’s T-test. Only 11 biological features were finally selected. Welch’s T-test was conducted to determine the p-value of all the features. Only the features with a p-value of less than 0.05 were selected as filtered features for subsequent parts of our study.
Result of ProtR package used to calculate the values of 9154 physicochemical features for the bacterial dataset. The physicochemical properties were computed using various programs present in ProtR for 574 protein sequences in the training dataset
Result of ProtR used to calculate the values of shortlisted 1436 physicochemical features (properties) for the bacterial dataset. Out of 9154 physicochemical properties, we found that 1436 properties emerged as significant (p <0.05; Welch’s T- Test).
Subsequently, we combined 11 biological features and 1436 physicochemical properties (1447 features) for subsequent analysis.
The biological and physicochemical features were computed for the protozoan, viral, and fungal systems. The results of which are as follows:
Result of RV and ProtR (combined) used to calculate the values of shortlisted 2074 features (properties) for the protozoan dataset.
Result of RV and ProtR used to calculate the values of shortlisted 1754 features (properties) for the viral dataset.
Result of RV and ProtR used to calculate the values of shortlisted 2801 features (properties) for the fungal dataset.
References
Monterrubio-López, G. P., & Ribas-Aparicio, R. M. (2015). Identification of novel potential vaccine candidates against tuberculosis based on reverse vaccinology. BioMed research international, 2015.
Naz, K., Naz, A., Ashraf, S. T., Rizwan, M., Ahmad, J., Baumbach, J., & Ali, A. (2019). PanRV: Pangenome-reverse vaccinology approach for identifications of potential vaccine candidates in microbial pangenome. BMC bioinformatics, 20(1), 1-10.
Muruato, L. A., Tapia, D., Hatcher, C. L., Kalita, M., Brett, P. J., Gregory, A. E., ... & Torres, A. G. (2017). Use of reverse vaccinology in the design and construction of nanoglycoconjugate vaccines against Burkholderia pseudomallei. Clinical and Vaccine Immunology, 24(11).
Solanki, V., & Tiwari, V. (2018). Subtractive proteomics to identify novel drug targets and reverse vaccinology for the development of chimeric vaccine against Acinetobacter baumannii. Scientific reports, 8(1), 1-19.
Goodswen, S. J., Kennedy, P. J., & Ellis, J. T. (2014). Vacceed: a high-throughput in silico vaccine candidate discovery pipeline for eukaryotic pathogens based on reverse vaccinology. Bioinformatics, 30(16), 2381-2383.
Schroeder, J., & Aebischer, T. (2011). Vaccines for leishmaniasis: from proteome to vaccine candidates. Human vaccines, 7(sup1), 10-15.
Dhanda, S.K., Usmani, S.S., Agrawal, P., Nagpal, G., Gautam, A. and Raghava, G.P., 2017. Novel in silico tools for designing peptide-based subunit vaccines and immunotherapeutics. Briefings in Bioinformatics, 18(3), pp.467-478
Liebenberg, J., Pretorius, A., Faber, F. E., Collins, N. E., Allsopp, B. A., & Van Kleef, M. (2012). Identification of Ehrlichia ruminantium proteins that activate cellular immune responses using a reverse vaccinology strategy. Veterinary immunology and immunopathology, 145(1-2), 340-349.
Pearson, W. R. (2013). An introduction to sequence similarity (“homology”) searching. Current Protocols in Bioinformatics, Chapter 3(SUPPL.42).