Conclusion

Conclusion


We mined the PDB database for high-resolution X-ray crystal structures of proteins and subsequently extracted structural features associated with disulfide bond pairs. We then applied random forest classification to predict the existence of a disulfide bond given a pair of residues and their pairwise interatomic distances for N, C, O, Ca and Cb and their dihedral angles phi and psi. We validated our model on a single residue pair for the prefusion structure of human cytomegalovirus glycoprotein B (PDBID 7kdp) that is believed to stabilize the prefusion structure. The model resulted in probability of 0.8 for this residue pair Q98C + G271C. The results of our model suggest that machine learning techniques could be used to predict the probability of a pair of residues forming a disulfide bond if mutated to cysteine given their structural features. This could help protein engineers rapidly screen residue pairs for potential mutations and reduce the subjectivity in determining which pairs should be mutated. Ultimately, this would ideally translate to a higher percentage of successful disulfide bond formations and thereby reducing lengthy cloning and laboratory procedures.