Model Building and Validation
Roger S. Rowlett
Gordon & Dorothy Kline Professor, Emeritus
Colgate University Department of Chemistry
Gordon & Dorothy Kline Professor, Emeritus
Colgate University Department of Chemistry
Coot is a very powerful and easy-to-use molecular visualization and model building program written by Paul Emsley, University of York, England. Using Coot, it is possible to visualize the quality of fit of models to electron density data, and also to interactively alter the model to better fit the data. The latter of these activities, termed rebuilding, is essential to the refinement process. Although refinement programs are very sophisticated, it is not now possible for an automated refinement to find the best fit of model to data if the model is too far away from the correct solution. The purpose of rebuilding is to position the model in a more appropriate starting point for refinement to do its magic.
Coot is specially designed to integrate well with the CCP4 suite of crystallography programs, so it is especially appropriate if you are using MOSFLM, SCALA, and Refmac. One of the many nice features of Coot is the ability to re-contour electron density maps on the fly using phase and intensity data written out by Refmac. It also has the ability to do very nice real-space refinements of segments of the model.
Coot Shortcuts (cheat sheet)
Figure 1. The Coot graphics window.
The molecule can be manipulated on the screen with the mouse.
It is frequently useful to generate symmetry-related atoms in the displayed model in order to observe interactions at protein-protein interfaces, or to get a more accurate view of an interfacial active site, etc.
Many protein crystals display non-crystallographic symmetry, and this can often be used to advantage in the early rounds of refinement to increase signal to noise. Coot will automatically find non-crystallographic symmetry in your molecule and display overlay traces of symmetric protein chains upon request.
Coot is always a work in progress, and has been known to crash unexpectedly. Fortunately, Coot is pretty good at saving your work as you go along, minimizing the chance that you will lose your work.
Common tasks in rebuilding models to better fit the electron density maps are described here. Typically, after each refinement cycle, the model is inspected for conformity to the electron density, and modified as necessary to make it possible for the refinement program to more easily find the best solution.
Mutating residues
One of the first tasks to complete when a structure is being solved by molecular replacement is to change the mismatched residues in the search model to conform with that of the target molecule.
Changing single residues
Changing Multiple residues
Mutating to non-standard amino acids
Adjusting side chain conformation
A frequent task in rebuilding is re-orienting side chains in the model to conform to the electron density map.
Adjusting main chain conformation
If you have to adjust the main chain trace, it is unlikely automated methods will work, else it would have been fixed already. Typically, the best way to adjust main chain conformation is by moving individual atoms and then regularizing the final result. To move atoms in the structure,
Regularizing the model
It is often necessary to clean up manual adjustments by regularizing the structure so that it conforms to normal bond angles and lengths. This is especially helpful when adjusting the main chain. To do this,
Writing out a PDB file
To save your structural edits, select File…Save Coordinates from the main menu. You will be prompted for the molecule to save (several may be open in the graphics window at the same time) and be expected to select a filename. Give the filename a .pdb extension to help identify it for later use.
Adding ligands to a protein
Adding ligands and cofactors is ridiculously easy in Coot.
Adding single atoms
Note: In CCP4-6.4.0 and the bundled Coot-0.7.2, there is a bug that affects adding metal atoms that results in the residue name being left blank. This should be fixed at the time of addition. The current workaround is to go Extensions...Modelling...Rename Residue, select the added cofactor, and type in the correct residue name. The residue name is normally the same as the atom name. The alternative workaround is to add metal atoms by using the Get Monomer... dialog from the File menu. This appears to add metal atoms correctly.
Adding molecules from the CCP4 libary
Adding novel ligands to a structure
Occasionally, it is necessary to add a ligand to a structure that is not in the CCP4 library. In this case, you will have to create both a .pdb file with atomic coordinate, and a .cif file with the appropriate molecular restraints.
Adding additional amino acids to the structure
As refinement proceeds, you may discover usable electron density at the N- and C-termini each protein cahin that can be fit to the model by adding the appropriate amino acids from the protein sequence. This is another easy Coot task.
Adding water molecules to a model using Coot
Renumbering water molecules in a model using Coot
While Coot is easy to use by exploring the menus and the various toolbars and dialog boxes, Coot is much more efficient to use with some personal customization to position various screens and dialogs as well as map common tasks to hotkeys. Customizations can be loaded automatically when Coot starts by including a Python format file named .coot.py (dot-coot-dot-py) in the appropriate location. In Linux, this file should be located in the user's home directory ($HOME, typically /home/username. In Windows, this file should be copied to the C:\CCP4-7\WinCoot directory. With some study of the Coot User Manual it should be possible to adapt these keybinding files to suit your own preferences and workflow.
Roger's Coot customization and key bindings file for Microsoft Surface Pro 6 ( rename to .coot.py)
Roger's Coot customization and key bindings file for 4K monitor ( rename to .coot.py)
These keybindings open the graphics window and post one or more useful dialogs in a useful arrangement on the screen. In addition, there are a number of customized hotkeys enabled for common rebuilding tasks. The hotkey functions enabled in the files above are listed below in bold type. The default keybindings in Coot are listed in non-bold type. Keybindings are case-sensitive.
The hotkeys discussed here reference the assignments above. The following list should give you an idea of how you can use hotkeys to speed your workflow. While everything you need to do can be executed from the menu bar, toolbar, or various dialog boxes, an experienced Coot-er can work much more quickly with some strategic hotkeys.
To verify the presence of ligands in protein structures, and to prepare figures for publication demonstrating presence of protein ligands, it is typical to create and display "omit maps" in which a difference electron density map (typically Fo-Fc) of the proposed ligand is derived solely from non-ligand atoms. Creating "omit maps" is a bit clumsy in Refmac, but can be accomplished as described below:
(This is the easiest way to generate maps for Display in Pymol.)
Before a structure is deposited with the Protein Data Bank, it is necessary to evaluate the proposed structure for its quality, including consistency with typical known bond lengths and angles, steric hindrance, and appropriate hydrogen bonding networks. Structures (.pdb files) can be evaluated in CCP4i, by a validation server, or directly in Coot if you are using it for model building.
CCP4 has a validation module that can be run from the CCP4i GUI. To validate a structure file,
PROCHECK, SFCHECK, and other validation programs can be run on web servers instead of CCP4i. The UCLA Institute for Genomics and Proteomics maintains one such server.
Coot itself has an extensive collection of built-in validation tools that are similar to PROCHECK, but have the great advantage of being interactive. You can generally click on any item of interest in the validation tools and be taken to that portion of the model and electron density for inspection. The validation tools can be used whenever you have a molecule loaded into the program. If you also have the latest electron density maps loaded, you can investigate and correct problems, if necessary, immediately. All validation tools are found under the Validation menu of Coot. A brief description of some of these tools follows:
Ramachandran plot
This tool displays a color coded Ramachandran plot of your protein. Allowed regions are magenta, generously allowed regions are yellow, and disallowed regions are gray. Glycine residues (which can frequently appear in less favorable regions of Φ-Ψ space) are denoted by triangles. All other residues are denoted by squares. Click on any symbol to go to that residue in the model/electron density map. Investigate any residues in non-allowed regions, and all non-glycine residues in generously allowed regions, and make corrections if necessary.
Incorrect chiral volumes
This tool will detect residues that have the wrong stereochemistry, that is amino acids that do not have the L-configuration. You should not encounter any errors during this check. All errors must be corrected. Occasionally, an L-amino acid is accidentally converted to a "D"-amino acid during rebuilding, and these errors should be fixed.
Check/delete waters
This tool will look for water molecules that do not meet reasonable criteria for electron density and hydrogen bonding distances. Investigate all questionable waters as candidates for deletion.
Geometry analysis
This tool will look for anomalies in bond lengths, bond angles, and planarity of residues in your model. Residues with acceptable geometries are denoted in green. Residues with significant deviations from standard geometry are flagged in orange or red. Investigate all anomalous residues and make corrections if necessary.
Rotamer analysis
This tool will look for unusual side-chain rotamers. Investigate all flagged residues and make corrections if justified. This tool will frequently identify easily missed Leu and Val residues that are flipped the "wrong" way.
Peptide omega analysis
This tool will look for non-planar or cis-peptide bonds. Cis-peptide bonds are rare in proteins, and unless electron density very clearly suggests otherwise, you should ensure that all peptide bonds in your model are trans.
GLN and ASN B-factor outliers
This tool will look for Gln and Asn residues that may need to be flipped 180 degrees to better account for electron density. Inspect these outliers, taking note of whether or not the alternate conformation makes more sense with hydrogen bonding partners. IN general, it is a good idea to check all Gln and Asn residues for correct orientation and sensible polar contacts.
Once structural problems have been resolved, run one cycle of Refmac without ARP_WATERS. Then repeat structural validation. When all fixable structural anomalies have been addressed, you are ready to submit the structure to the Protein Data Bank.