The experimentally determined bacterial PVCs that evoke an immune response in the hosts are said to have positive protective potential and are termed as Bacterial Protective Antigens (BPAgs). A total of 574 protein sequences of Gram+ve and Gram-ve bacteria were extracted, out of which, 100 positive samples (See Table 1) and 100 negative samples (See Table 8) were downloaded from the VaxiJen server (Doytchinova & Flower, 2007). The remaining 284 positive samples and 90 negative samples were extracted through text data mining (See Table 2 - 7 and 9). The non-antigen dataset (90 negative samples) was prepared by randomly selecting proteins from the proteome of the respective species from where the positive samples were collected. Finally, a table consisting of 384 positive and 190 negative protein samples was generated.
Hisham, Y. and Ashhab, Y., 2018. Identification of cross-protective potential antigens against pathogenic Brucella spp. through combining Pan-genome analysis with reverse vaccinology. Journal of immunology research, 2018.
Araújo, C.L., Alves, J., Nogueira, W., Pereira, L.C., Gomide, A.C., Ramos, R., Azevedo, V., Silva, A. and Folador, A., 2019. Prediction of new vaccine targets in the core genome of Corynebacterium pseudotuberculosis through omics approaches and reverse vaccinology. Gene, 702, pp.36-45.
de Sarom, A., Kumar Jaiswal, A., Tiwari, S., de Castro Oliveira, L., Barh, D., Azevedo, V., Jose Oliveira, C. and de Castro Soares, S., 2018. Putative vaccine candidates and drug targets identified by reverse vaccinology and subtractive genomics approaches to control Haemophilus ducreyi, the causative agent of chancroid. Journal of The Royal Society Interface, 15(142), p.20180032.
Vilela Rodrigues, T.C., Jaiswal, A.K., de Sarom, A., de Castro Oliveira, L., Freire Oliveira, C.J., Ghosh, P., Tiwari, S., Miranda, F.M., de Jesus Benevides, L., Ariston de Carvalho Azevedo, V. and de Castro Soares, S., 2019. Reverse vaccinology and subtractive genomics reveal new therapeutic targets against Mycoplasma pneumoniae: a causative agent of pneumonia. Royal Society Open Science, 6(7), p.190907.
Vivona, S., Bernante, F. and Filippini, F., 2006. NERVE: new enhanced reverse vaccinology environment. BMC biotechnology, 6(1), p.35.
Dalsass, M., Brozzi, A., Medini, D. and Rappuoli, R., 2019. Comparison of open-source reverse vaccinology programs for bacterial vaccine antigen discovery. Frontiers in immunology, 10, p.113.
Doytchinova, I.A. and Flower, D.R., 2007. VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines. BMC bioinformatics, 8(1), p.4.
The protozoan training dataset consists of 468 protein sequences (175 positive datasets sequences and 293 negative/control dataset sequences). The 175 sequences (labeled as ‘positive’) were directly extracted from the Protegen database. 293 negative sequences were randomly selected from various protozoan species using the UniProt database. We made sure that the negative sequences were not similar to the positive sequences. Only sequences that had a BLAST expectation value (>3; signifying non similarity) were selected.
Our training dataset consists of 837 protein sequences (433 positive sequences and 404 negative sequences). 404 negative sequences were randomly selected from various viral species. The positive and negative dataset sequences were not similar to each other.
Doytchinova, I.A. and Flower, D.R., 2007. VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines. BMC bioinformatics, 8(1), p.4.
The fungal training dataset consists of 1086 protein sequences. 139 positive sequences were collected through literature mining. 947 negative sequences were randomly selected from various fungal species.