Protein Subcellular Localization

Complementaty material for "Multi-Label Protein Subcellular Localization using Gene Ontology Terms"

Donwloads:

Results of the 10 folds cross validation:

The evaluation was performed using the 10-fold cross validation strategy. We used three different strategies to generate train and test folds: one that splits the data randomly, and other two different stratification methods proposed by Sechidis et. al., the labelset stratification and the iterative stratification. In the labelset strategy, the existence of disjoint groups within a population is taken into account, in order to produce samples where the proportion of these groups is maintained. In the iterative strategy, firstly, the desired number of instances in each subset is calculated. Then, each instance is examined iteratively so that the algorithm can select an appropriate subset for distribution. These stratification methods are implemented within the utiml R Package (Rivolli 2016).

K. Sechidis, G. Tsoumakas, and I. Vlahavas. 2011. On the Stratication of Multi-label Data. European Conference on Machine Learning and Knowledge Discovery in Databases. 145 - 158.

A. Rivolli. 2016. utiml: Utilities for Multi-Label Learning. R package version 0.1.0.

Virus dataset:

Plants dataset: