The entire protein interaction network can be used to sample subgraphs as new candidate complexes. Alternately, a reduced protein interaction network containing only proteins in known complexes can be used, to perhaps evaluate the sampling algorithm better, but more importantly, to yield higher efficiencies.
Common steps in all algorithms:
This is the most simplistic growth strategy for building complexes with interactions of high confidences.
As we can intuit, the choice of which neighbor to add at each step dictates the accuracy of complexes formed. To improve on the previous strategy which checks only the highest edge weight neighbor, we can check more neighbors to find the best node to add.
The next 3 algorithms use the following strategy to choose the 'most important neighbor' of a node (candidate node to add to a complex) :
Same as metropolis algorithm, with the metropolis probability replaced by the probability
p = e^((current_score - old_score)/T)
where T = T_old/alpha, starting with T0.
The analogy is that like the annealing process of metals, temperature T slowly decreases with time (here, iterations), minimizing energy, here, minimizing probability, thus exploring lesser towards the end.
Complexes with only 2 nodes are removed.
An overlap threshold is supplied and a smaller complex that overlaps (shares nodes) more than the threshold value with any other complex (of bigger size) is removed.
We compare the set of predicted complexes with known complexes to evaluate the algorithm.
First, we construct a set of reduced protein complexes containing proteins only present in the known complexes. We retain complexes with 3 or more nodes only.
We compare the predicted complex set and the known complex set, and say that a predicted complex recovers a known complex if,
where, p is an input threshold parameter between 0 and 1 and
Precision, Recall and F1 measures are calculated using this information, i.e
Precision = No. of predicted complexes that recover a known complex/No. of predicted complexes
Recall = No. of recovered known complexes/No. of known complexes