Recent site activity

Consensus

In this project, three potential methods to generate a consensus were explored.

A. Simple Consensus 1 - group by number of domains
This method grouped the methods by the number of domains that were assigned by each method.  The boundaries are not taken into account.  In the end, it was decided that this was too simplistic and this not added to the site as a feature.

B. Simple Consensus 2 - group by cut boundary
This method takes into account the boundaries of each cut made by each algorithm, and groups the methods if all the cuts fall within a 20% residue window right along the chain.  This is explained in more detail on the web front end -> consensus page.

C: Weighted Consensus
This groups the methods in exactly the same way as detailed in the previous consensus method.  This time however, the way the score is generated for each group is different.
In this case, each algorithm has an initial weighting. The weighting is either increased or reduced based on the methods performance in comparison to the rest of the methods.

Initially, each algorithm is given the following weight:
PDP84.4%NCBI81.9%DP78.1% DDomain76.5%DHcL68.3%PUU74.0%

Dodis40.0%



Once all the methods have been grouped, the weights are adjusted based on three sets of rules. These are:
    • Number of Predicted Domains
  • If the number of domains predicted by PDP and NCBI >=4, then the weight assigned to DP is reduced by 10%, as DP typically predicts less domains, especially when the true number of domains is 4 or more.

    If PUU predicts more domains than PDP and NCBI, downgrade PUU prediction by 10%. As noted in previous studies (Holland et al, (2006), Partitioning protein structures into domains: why is it so difficult?, JMB 361, 562-590), PUU tends to overcut a protein chain

    If PDP predicts five domains or more, downgrade NCBI by 10%

    • Number of Fragments per Domain/Chain
  • If three or more methods have at least one domain fragmented (may not be the same domain) then the weight of all methods that do not predict fragmented domains is reduced by 10%

    If NCBI and PDP have no fragmented domains, then the weight of all methods that predict fragmented domains is reduced by 10%

    • Based on Type of Structure
  • If the structure is all alpha-helix (in the DSSP structure definition), and NCBI and PDP disagree on the number of domains in the chain, the weight of PDP is increased by 10%

    If the structure is all beta-sheet, and NCBI and PDP disagree on the number of domains, the weight of PDP is increased by 10%

    If the structure is all beta-sheet and NCBI and PDP agree, the weight of both methods is increased by 10%

    If the structure is alpha-beta, and NCBI and PDP agree, the weights of all methods that disagree with PDP and NCBI are reduced by 10%

    This produces a final weight for each method, and all weights are totalled. The score of each group is then set to be the total of all weights assigned to the methods in that group divided by the total of all the weights.  Again, the group that has the highest score over 0.4 is recognised as the consensus. Should any group have a score within 0.1 of the highest group, this is also presented as a potential consensus.