Protein complexes exhibit a variety of different structures, yet they are not random and we can hopefully make use of their structure and inherent patterns to identify new complexes.
Topological features are extracted for each of the positive/negative complexes to construct the final train and test datasets with the following 18 features :
In order to eliminate negative complexes (random walks) that resemble known complexes, we remove any row (sample) of a negative complex that matches any row of a positive complex exactly.