The experimentally determined bacterial protective antigens that evoke an immune response in the hosts are said to have positive protective potential and are termed as Bacterial Protective Antigens (BPAgs). A total of 574 protein sequences of Gram+ve and Gram-ve bacteria were extracted, out of which, 100 positive samples and 100 negative samples were downloaded from the VaxiJen server (Doytchinova & Flower, 2007). The remaining 284 positive samples and 90 negative samples were extracted through text data mining. Negative sequences were extracted from the proteomes of the species present in the positive dataset using UniProt [9].
The protozoan training dataset consists of 468 protein sequences (175 positive dataset sequences and 293 negative/control dataset sequences). The 175 sequences (labeled as ‘positive’) were directly extracted from the Protegen database. Negative sequences were extracted from the proteomes of the species present in the positive dataset using UniProt [9].
Our training dataset consists of 837 protein sequences (418 positive sequences and 419 negative sequences). 419 negative sequences were randomly selected from various viral species. Negative sequences were extracted from the proteomes of the species present in the positive dataset using UniProt [9]
The fungal training dataset consists of 1086 protein sequences. 139 positive sequences were collected through literature mining. Negative sequences were extracted from the proteomes of the species present in the positive dataset using UniProt [9]
References
Hisham, Y. and Ashhab, Y., 2018. Identification of cross-protective potential antigens against pathogenic Brucella spp. through combining Pan-genome analysis with reverse vaccinology. Journal of immunology research, 2018.
Araújo, C.L., Alves, J., Nogueira, W., Pereira, L.C., Gomide, A.C., Ramos, R., Azevedo, V., Silva, A. and Folador, A., 2019. Prediction of new vaccine targets in the core genome of Corynebacterium pseudotuberculosis through omics approaches and reverse vaccinology. Gene, 702, pp.36-45.
de Sarom, A., Kumar Jaiswal, A., Tiwari, S., de Castro Oliveira, L., Barh, D., Azevedo, V., Jose Oliveira, C. and de Castro Soares, S., 2018. Putative vaccine candidates and drug targets identified by reverse vaccinology and subtractive genomics approaches to control Haemophilus ducreyi, the causative agent of chancroid. Journal of The Royal Society Interface, 15(142), p.20180032.
Vilela Rodrigues, T.C., Jaiswal, A.K., de Sarom, A., de Castro Oliveira, L., Freire Oliveira, C.J., Ghosh, P., Tiwari, S., Miranda, F.M., de Jesus Benevides, L., Ariston de Carvalho Azevedo, V. and de Castro Soares, S., 2019. Reverse vaccinology and subtractive genomics reveal new therapeutic targets against Mycoplasma pneumoniae: a causative agent of pneumonia. Royal Society Open Science, 6(7), p.190907.
Vivona, S., Bernante, F. and Filippini, F., 2006. NERVE: new enhanced reverse vaccinology environment. BMC biotechnology, 6(1), p.35.
Dalsass, M., Brozzi, A., Medini, D. and Rappuoli, R., 2019. Comparison of open-source reverse vaccinology programs for bacterial vaccine antigen discovery. Frontiers in immunology, 10, p.113.
Doytchinova, I.A. and Flower, D.R., 2007. Vaxi Jen: a server for prediction of protective antigens, tumour antigens and subunit vaccines. BMC bioinformatics, 8(1), p.4.
Yang, B., Sayers, S., Xiang, Z. & He, Y. Protegen: a web-based protective antigen database and analysis system. Nucleic Acids Res 39, D1073-8 (2011).
UniProt, C. UniProt: a hub for protein information. Nucleic Acids Res 43, D204-12 (2015).