Yeast interactome visualized by John "Scooter" Morris & Alex Pico
Protein complexes take part in many cellular functions and their identification will help understand several diseased states. Currently, there is a wealth of information on protein interactions, which has led to the construction of extensive accurate protein interaction networks. These networks can be mined computationally to identify candidate protein complexes which can then be experimentally characterized.
State-of-the-art algorithms primarily employ unsupervised clustering algorithms for community detection, generally relying on the main assumption that complexes are dense subgraphs of the network. Naturally, this approach fails to identify complexes with different topological structures that are not dense. With more information being made available about biological complexes through experimental methods, supervised learning methods show promise in outperforming unsupervised methods since they directly incorporate information from known complexes into the prediction process.
Super.Complex is a pipeline incorporating complex feature extraction; identification as well as training of the best supervised machine learning model and finally, candidate complex sampling from the network using growth strategies with information from the machine learning model.