In this project, three potential methods to generate a consensus were explored. A. Simple Consensus 1 - group by number of domains This method grouped the methods by the number of domains that were assigned by each method. The boundaries are not taken into account. In the end, it was decided that this was too simplistic and this not added to the site as a feature. B. Simple Consensus 2 - group by cut boundary This method takes into account the boundaries of each cut made by each algorithm, and groups the methods if all the cuts fall within a 20% residue window right along the chain. This is explained in more detail on the web front end -> consensus page. C: Weighted Consensus This groups the methods in exactly the same way as detailed in the previous consensus method. This time however, the way the score is generated for each group is different. In this case, each algorithm has an initial weighting. The weighting is either increased or reduced based on the methods performance in comparison to the rest of the methods. Initially, each algorithm is given the following weight:
If PUU predicts more domains than PDP and NCBI, downgrade PUU prediction by 10%. As noted in previous studies (Holland et al, (2006), Partitioning protein structures into domains: why is it so difficult?, JMB 361, 562-590), PUU tends to overcut a protein chain If PDP predicts five domains or more, downgrade NCBI by 10%
If NCBI and PDP have no fragmented domains, then the weight of all methods that predict fragmented domains is reduced by 10%
If the structure is all beta-sheet, and NCBI and PDP disagree on the number of domains, the weight of PDP is increased by 10% If the structure is all beta-sheet and NCBI and PDP agree, the weight of both methods is increased by 10% If the structure is alpha-beta, and NCBI and PDP agree, the weights of all methods that disagree with PDP and NCBI are reduced by 10% This produces a final weight for each method, and all weights are totalled. The score of each group is then set to be the total of all weights assigned to the methods in that group divided by the total of all the weights. Again, the group that has the highest score over 0.4 is recognised as the consensus. Should any group have a score within 0.1 of the highest group, this is also presented as a potential consensus. |