We divided the bacterial, protozoan, viral, and fungal protein sequence datasets into training and testing subsets for use in the deep learning model.
Division of the overall bacterial dataset into training and testing subsets for our deep learning model. 524 protein samples (91.29 %) were used for training, out of which, 355 were positive protein samples and 169 were negative protein samples. 50 (8.71%) protein sequences were used for testing, out of which, 29 sequences were labelled positive whereas 21 were labelled as negative by human annotators.
Division of the overall protozoan dataset into training and testing subsets for our deep learning model. 371 protein samples (79.27%) were used for training, out of which, 141 were positive protein samples and 230 were negative protein samples. 97 (20.73%) protein sequences were used for testing, out of which, 34 sequences were labelled positive whereas 63 were labelled as negative by human annotators.
Division of the overall viral dataset into training and testing subsets for our deep learning model. 740 protein samples (88.41%) were used for training, out of which, 377 were positive protein samples and 363 were negative protein samples. 97 (11.59%) protein sequences were used for testing, out of which, 56 sequences were labelled positive whereas 41 were labelled as negative by human annotators.
Division of the overall fungal dataset into training and testing subsets for our deep learning model. 1850 protein samples ( 95.02 %) were used for training, out of which, 953 were positive protein samples and 897 were negative protein samples. 97 (4.98%) protein sequences were used for testing, out of which, 47 sequences were labelled positive whereas 50 were labelled as negative by human annotators.