Model Building and Validation

Roger S. Rowlett

Gordon & Dorothy Kline Professor, Emeritus

Colgate University Department of Chemistry

Using Coot

Coot is a very powerful and easy-to-use molecular visualization and model building program written by Paul Emsley, University of York, England. Using Coot, it is possible to visualize the quality of fit of models to electron density data, and also to interactively alter the model to better fit the data. The latter of these activities, termed rebuilding, is essential to the refinement process. Although refinement programs are very sophisticated, it is not now possible for an automated refinement to find the best fit of model to data if the model is too far away from the correct solution. The purpose of rebuilding is to position the model in a more appropriate starting point for refinement to do its magic.

Coot is specially designed to integrate well with the CCP4 suite of crystallography programs, so it is especially appropriate if you are using MOSFLM, SCALA, and Refmac. One of the many nice features of Coot is the ability to re-contour electron density maps on the fly using phase and intensity data written out by Refmac. It also has the ability to do very nice real-space refinements of segments of the model.

Coot User manual (PDF, HTML)

Coot Shortcuts (cheat sheet)


Starting Coot and loading a molecule and electron density maps

  • The command for starting Coot is typically aliased to something like coot. Usually, the command is issued from the local directory from which you are working, so that you do not have to specify complete paths to your data files. Alternatively, Coot can be started from a button in the Results tab of refmac by clicking on the Coot button next to structure and electron density. This special button loads your current model and electron density map file.
    • Note: for full functionality you must source the CCP4 setup files in Linux to give Coot access to CCP4 dictionaries and functions. The CCP4 source command is typically aliased to a command like ccp4setup.
  • When Coot is started, it always asks you if you want to run an auto-save file that stored the last saved state. Unless you want to start where you left off, you can click on No. If you open Coot with the Coot button in CCP4, it will always open the output files from that job.
  • To load a pdb file, select File…Open Coordinates and choose the appropriate file name. The selected molecule will be loaded and displayed as a stick model.
  • To open a Refmac-generated MTZ file that contains phases and intensities for contourable maps, select File…Auto Open MTZ and choose the appropriate file name. Both 2Fo–Fc (blue) and Fo–Fc (positive = green and negative = red) electron density maps will be automatically displayed.
  • The graphical viewing environment of Coot is shown in Figure 1.

Figure 1. The Coot graphics window.


  • By default, the 2Fo–Fc map is contoured at approximately 1.5σ and the Fo–Fc is contoured at approximately 3.5σ. The contour settings can be changed by rolling the center wheel of the mouse. To select which map will be re-contoured by default, select HID…ScrollWheel…Attach ScrollWheel to which map from the menu.

Navigating and inspecting a molecule in Coot

The molecule can be manipulated on the screen with the mouse.

  • Press the left mouse button and drag to spin the molecule.
  • To zoom, press the right mouse button and drag.
  • To slab (cut away) the molecule, press "f" and the right mouse button and drag up and down.
  • To navigate to a particular atom, select Draw…Go To Atom to open up the navigation window and select the desired protein chain, residue, and/or atom desired.
  • To recenter on an atom visible on the screen, middle-click on it.
  • To navigate to the next residue in the sequence, press the space bar.

Generating symmetry atoms and non-crystallographic symmetry traces

It is frequently useful to generate symmetry-related atoms in the displayed model in order to observe interactions at protein-protein interfaces, or to get a more accurate view of an interfacial active site, etc.

  • To display symmetry atoms, choose Draw…Cell & Symmetry and tick Yes in the Show Symmetry Atoms box.

Many protein crystals display non-crystallographic symmetry, and this can often be used to advantage in the early rounds of refinement to increase signal to noise. Coot will automatically find non-crystallographic symmetry in your molecule and display overlay traces of symmetric protein chains upon request.

  • To display non-crystallographic symmetry, choose Edit…Bond Parameters and tick Yes in the Draw Non-Crystallographic Ghosts box. Non-crystallographic symmetry traces will be overlayed on the A chain of your protein.

Recovering a session after a program crash

Coot is always a work in progress, and has been known to crash unexpectedly. Fortunately, Coot is pretty good at saving your work as you go along, minimizing the chance that you will lose your work.

  • To recover from a program crash up to but not including the last program edit
  • Open Coot and read in the pdb file you were last working on
  • Select File…Recover Session from the menu.
  • After your PDB file has been updated, you may read in your electron density maps using File...AutoOpen MTZ and resume.

Basic Model Building Tasks in Coot

Common tasks in rebuilding models to better fit the electron density maps are described here. Typically, after each refinement cycle, the model is inspected for conformity to the electron density, and modified as necessary to make it possible for the refinement program to more easily find the best solution.

Mutating residues

One of the first tasks to complete when a structure is being solved by molecular replacement is to change the mismatched residues in the search model to conform with that of the target molecule.

Changing single residues

  • Click on the Simple Mutate tool on the Model/Fit/Refine menu bar
  • Click on the residue you would like to change
  • Choose the amino acid you would like to change it to from the popup menu

Changing Multiple residues

  • Choose Calculate…Mutate Residue Range from the menu.
  • In the dialog box choose the protein chain and the residue number or range to be mutated, and type in the one-letter amino acid code(s) for the mutation.
  • If desired, you can autofit the mutated residue to the electron density map upon mutation by checking the appropriate box.

Mutating to non-standard amino acids

  • Choose Extensions...Modelling...Replace Residue
  • In the dialog box, type in the Refmac-compatible 3-letter code for the non-standard amino acid, e.g. CSO
  • The non-standard amino acid will be inserted and real-space refined into place.

Adjusting side chain conformation

A frequent task in rebuilding is re-orienting side chains in the model to conform to the electron density map.

  • Open the refinement task menu by selecting Calculate…Model/Fit/Refine.
  • You now have several options to adjust side chain conformation on the task menu
    • Auto Fit Rotamer will select the best-fitting side-chain rotamer from a library of commonly observed conformations. This may be a good first attempt in some situations.
    • You may also elect to interactively select a rotamer from the library by selecting Rotamers... from the task menu. To further refine this solution automatically, you can select Real Space Refine Zone and then click twice on any atom in the side chain (to define the side chain as the refinement zone). A dialog box will offer you the choice to accept or reject the fit, which will be highlighted in the graphics window.
    • For more precise control over side-chain fitting, Edit Chi Angles should be selected. Click on the side chain of the desired residue, and alter individual chi angles by sliding the mouse back and forth on the graphics screen. The new conformation will be highlighted. To quickly shift between chi angles, you can use the number keys: pressing 1 selects the first chi angle, 2 selects the second, etc. To complete the operation, select Accept or Cancel in the dialog box.

Adjusting main chain conformation

If you have to adjust the main chain trace, it is unlikely automated methods will work, else it would have been fixed already. Typically, the best way to adjust main chain conformation is by moving individual atoms and then regularizing the final result. To move atoms in the structure,

  • Select Rotate/Translate Zone from the Model/Fit/Refine task menu.
  • Click on any atom in the residue you wish to move.
  • To move an entire residue, simply click and drag. To move a single atom, CTRL-click and drag. Coot will automatically make and break atomic connections according to interatomic distance, so proceed cautiously to maintain the correct main- and side-chain connectivity!
  • Select OK to accept changes, or Cancel to abandon.

Regularizing the model

It is often necessary to clean up manual adjustments by regularizing the structure so that it conforms to normal bond angles and lengths. This is especially helpful when adjusting the main chain. To do this,

  • Select Regularize Zone from the task menu.
  • Click on two atoms in the structure between which the structure will be regularized, typically plus and minus at least one residue from the area in which manual changes were made so that changes can be blended into the overall main-chain trace. The protein structure between these points will be regularized

Writing out a PDB file

To save your structural edits, select File…Save Coordinates from the main menu. You will be prompted for the molecule to save (several may be open in the graphics window at the same time) and be expected to select a filename. Give the filename a .pdb extension to help identify it for later use.

Advanced model building tasks in Coot

Adding ligands to a protein

Adding ligands and cofactors is ridiculously easy in Coot.

Adding single atoms

  • Navigate the pointer (the little pink box) to the place where you would like to add an atom.
  • On the Model/Fit/Refine menu, click on Place atom at pointer.
  • Choose from among the preselected atom types, or click Other and type in an atom type. Examples:
    • zinc = ZN
    • cobalt = CO
    • etc.
    • Please note that atom names are case sensitive, and conform to the CCP4 library abbreviations.
  • In the dropdown menu, select the option to add the new atom to the current molecule.
  • When adding metal ions or monoatomic anions to a protein, you should manually edit the PDB file to reflect the correct atomic charge in the last four columns of the ATOM record. A zinc ion should be entered as "ZN+2" and a chloride ion as "CL-1", etc.

Note: In CCP4-6.4.0 and the bundled Coot-0.7.2, there is a bug that affects adding metal atoms that results in the residue name being left blank. This should be fixed at the time of addition. The current workaround is to go Extensions...Modelling...Rename Residue, select the added cofactor, and type in the correct residue name. The residue name is normally the same as the atom name. The alternative workaround is to add metal atoms by using the Get Monomer... dialog from the File menu. This appears to add metal atoms correctly.


Adding molecules from the CCP4 libary

  • From the main menu, select File…Get Monomer and enter the three letter code of the desired ligand, cofactor, or metal. A complete list of monomers can be found in the CCP4 documentation.
  • The selected molecule will be placed at the center of the display.
  • Move the ligand to the desired location using Rotate/Translate Zone in the Model/Fit/Refine task menu.
  • The coordinates for the cofactor can
    • be written out as a separate PDB file for manual merging into the protein coordinate file, or
    • the coordinates can be appended to the end of any displayed PDB file by selecting Calculate…Merge Molecules.


Adding novel ligands to a structure

Occasionally, it is necessary to add a ligand to a structure that is not in the CCP4 library. In this case, you will have to create both a .pdb file with atomic coordinate, and a .cif file with the appropriate molecular restraints.

  • Open the program JLigand in CCP4i
  • Click on the New Ligand button and enter a 3 letter name (all caps) for your ligand, and an initial atom type to start your drawing. Preferably choose a ligand name that is not already in use in the Protein Data Bank, e.g. "IPS" for isoproylsulfonamide.
  • Draw your molecule on the canvas. Use the Help feature for drawing instructions.
  • When your drawing is complete, regularize your structure by clicking on Ligand...Regularize.
  • Save the PDB file that includes all hydrogens (you will throw away the hydrogens anyway during refinement) using File...Save Coordinates.
  • Save the corresponding CIF (restraints) file by choosing File...Save as Monomer
  • In Coot, load the PDB file for your ligand with File...Open Coordinates
  • In Coot, navigate to the positive density corresponding to your ligand and center it over the pointer (pink box)
  • From the menu, select Calculate...Move Molecule Here and select your ligand molecule to move. This will move it close to the desired location.
  • Using Rotate/Translate Zone in the Model/Fit/Refine dialog, reorient the ligand molecule to better fit the density.
  • Merge the ligand into your protein molecule as another chain using Edit...Merge Molecules. Your ligand should appear as another chain of the main molecule with the residue designation you gave it in JLigand. (If it adds the ligand as an additional residue to an existing chain, you can manually edit the PDB file to put it in a separate chain later.)
  • Make any links, if desired, between the ligand and the protein using Extensions...Modeling...Make Link
  • Use File...Import CIF Dictionary and read in the CIF file you generated to connect add the molecule restraints to the ligand. This will allow you to carry out real-space refinement, regularization , or chi angle adjustments in Coot.
  • Save the edited molecule using File...Save Coordinates.
  • When refining the protein + ligand in Refmac, input the CIF file for your ligand in the Lib in field so that Refmac will use proper constraints during refinement. You will have to re-import the CIF file into Coot each time in order to carry out real space refinement, regularization, or chi angle adjustments


Adding additional amino acids to the structure

As refinement proceeds, you may discover usable electron density at the N- and C-termini each protein cahin that can be fit to the model by adding the appropriate amino acids from the protein sequence. This is another easy Coot task.

  • From the Model/Fit/Refine task menu, select Add Terminal Residue… and click on the terminus of the molecule you would like to add to.
  • Coot will add an alanine residue and make its best guess of the appropriate conformation.
  • You may have to mutate the added residue to the correct side chain and adjust its conformation to match the observed electron density.


Adding water molecules to a model using Coot

  • Open the molecule in Coot
  • Go to the Calculate menu and select Other Modelling Tools
  • Select the Find Waters option
  • Change the settings so it "Finds Peaks above" 1.5 sigma (Later on, more waters can be added by lowering the sigma)
  • The default settings for minimum and maximum distance to protein atoms are okay
  • Waters are generally added to the molecule, as opposed to a new "waters" molecule
  • Click on the Find Waters button at the bottom
  • To rename the chain (to S or W for example) after the water molecules have been added, choose Calculate and Change Chain IDs
  • Select the chain you wish to rename, enter a new name, and click Apply New Chain ID


Renumbering water molecules in a model using Coot

  • Open the molecule containing the waters to be numbered in Coot
  • Choose Extensions…Renumber Waters from the menu
  • Confirm the selected molecule is the molecule containing the waters, select OK, and the waters will be renumbered beginning with 1.

Customizing Coot

While Coot is easy to use by exploring the menus and the various toolbars and dialog boxes, Coot is much more efficient to use with some personal customization to position various screens and dialogs as well as map common tasks to hotkeys. Customizations can be loaded automatically when Coot starts by including a Python format file named .coot.py (dot-coot-dot-py) in the appropriate location. In Linux, this file should be located in the user's home directory ($HOME, typically /home/username. In Windows, this file should be copied to the C:\CCP4-7\WinCoot directory. With some study of the Coot User Manual it should be possible to adapt these keybinding files to suit your own preferences and workflow.

Roger's Coot customization and key bindings file for Microsoft Surface Pro 6 ( rename to .coot.py)

Roger's Coot customization and key bindings file for 4K monitor ( rename to .coot.py)

These keybindings open the graphics window and post one or more useful dialogs in a useful arrangement on the screen. In addition, there are a number of customized hotkeys enabled for common rebuilding tasks. The hotkey functions enabled in the files above are listed below in bold type. The default keybindings in Coot are listed in non-bold type. Keybindings are case-sensitive.

  • CTRL-g (keyboard-go-to-residue)
  • CTRL-s (quick-save-as)
  • CTRL-i (display residue info)
  • CTRL-z (undo)
  • CTRL-y (redo)
  • a (refine with autozone)
  • b (toggle baton swivel)
  • c (toggle crosshairs/rulers)
  • d (reduce depth of field)
  • f (increase depth of field)
  • u (undo last navigation)
  • i (toggle spin mode)
  • l (toggle label of closest atom)
  • m (zoom out)
  • o (other NCS chain)
  • s (update skeleton)
  • . (up in button list)
  • , down in button list
  • r (real space refine current residue)
  • T (Triple refine - real space refine 3 residues around current residue)
  • Q (Quintuple refine - real pace refine 5 residues around current residue)
  • j (auto fit rotamer)
  • q (flip peptide)
  • w (add water at pointer)
  • X (delete active residue)
  • e (toggle environment distances)
  • t (toggle pointer distances)
  • G (regularize 3 residues around current residue)
  • J (jiggle fit current residue)

Suggested uses for hotkeys

The hotkeys discussed here reference the assignments above. The following list should give you an idea of how you can use hotkeys to speed your workflow. While everything you need to do can be executed from the menu bar, toolbar, or various dialog boxes, an experienced Coot-er can work much more quickly with some strategic hotkeys.

  • When scanning through your protein chain looking for things to fix, turning on spin mode (i) can help you visualize electron density in three dimensions. This is helpful if you are not using a 3D display. Toggling the labels (l) can be helpful if they are not displayed, so you can find out where you are, or to make labels disappear when you are trying to see more detail.
  • When fitting residues to observed real-space density, j followed by r, T, or Q is often sufficient to correctly orient a residue side chain. If this fails, it may be necessary to use the Edit Chi Angles dialog before using r, T, or Q. T and Q are useful to take into account the real space fits of neighboring residues. If bond angles get out of whack, or bonds get disconnected, G operating over a selected range of 3-5 residues will put things back in order.
  • When peptide bonds appear to be pointing the wrong way (common in molecular replacement refinements, unfortunately) q followed by T or Q and/or G is a good way to refit the main chain and fix any fallout caused by the main chain realignment.
  • When manually adding and removing waters from difference density--usually after running FindWaters in Refmac or Coot--w and X make it easy to add or delete waters quickly. If you accidentally delete something you didn't want to, use CTRL-z to undo. (It usually, but not always, works.) When examining waters, enabling environment distances (e) will allow you to see if you candidate water is actually close enough to hydrogen bond to anything. If a water is not close to a hydrogen bonding partner or another water which is close to a hydrogen bonding partner, you can quickly remove it with X.
  • If you are having trouble stuffing a residue into the proper density using the usual efforts, try J for a random jiggle fit.

Creating Omit Maps for Ligands using Refmac

To verify the presence of ligands in protein structures, and to prepare figures for publication demonstrating presence of protein ligands, it is typical to create and display "omit maps" in which a difference electron density map (typically Fo-Fc) of the proposed ligand is derived solely from non-ligand atoms. Creating "omit maps" is a bit clumsy in Refmac, but can be accomplished as described below:

  • Open the .pdb file in a text editor or in Coot and change the occupancy of the desired ligand(s) to zero. (Alternatively delete the atom records for the desired ligands.) Save this new file with a new name (do not overwrite the original) for later use.
  • Start a Refmac job with the latest .mtz structure factor file in the refinement and the "omit" .pdb file edited above as the input files
  • Choose to do either a "restrained refinement with no prior phase information" or a "rigid body refinement" with 0 cycles. The idea is to generate a set of Refmac sigma-AA weighted map data in the output .mtz file without significantly changing the final, refined structure.
  • Inspect the file in Coot. The difference map should show significant density around the desired ligand(s)

Exporting Omit Maps for use in Pymol using Coot

(This is the easiest way to generate maps for Display in Pymol.)

  • Select File...Export Map in Coot
  • Choose the map you would like to export
    • Use the FWT-PHWT map for a 2Fo-Fc style map
    • Use the DELFWT-PHDELWT for a Fo-Fc style map
  • Save the map with the .ccp4 extension to use in Pymol.

Exporting Omit Maps for use in Pymol using CCP4

  • Start an FFT job in CCP4i and select "simple map"
  • Use as an input file the "omit" .mtz file created above.
  • Optionally, select to "cover all atoms in PDB file" and use your edited "omit" .pdb file as input
  • To calculate a Fo-Fc map, select DELFWT for F1 and PHDELWT for PHI
  • To calculate a 2Fo-Fc map, select FWT for F1 and PHWT for PHI
  • To use the map in Pymol, add the extension .ccp4 to the output map.
  • Run the job by selecting Run...Run now.

Model Validation

Before a structure is deposited with the Protein Data Bank, it is necessary to evaluate the proposed structure for its quality, including consistency with typical known bond lengths and angles, steric hindrance, and appropriate hydrogen bonding networks. Structures (.pdb files) can be evaluated in CCP4i, by a validation server, or directly in Coot if you are using it for model building.

Using CCP4

CCP4 has a validation module that can be run from the CCP4i GUI. To validate a structure file,

  • Select the Validation and Deposition module from the CCP4i project window
  • Select the task Run Sfcheck and Procheck
  • In the task window,
    • Enter a job title, e.g., validate
    • Enter a coordinate (.pdb) and corresponding structure factor (.mtz) filenames
    • Select an output filename or accept the one provided
  • Start the job by selecting Run…Run Now at the lower left of the task window. The job will be entered into the job list in the CCP4i window, and you can monitor its status.
  • When the job is finished, inspect the Ramachandran plot and the residue properties files for geometry problems:
    • Examine the Ramachandran plot to determine if most residues (except for mostly Gly) are in the preferred Φ and Ψ angle regions. Typically at least 90% of the residues should be in the preferred regions. Investigate any residues other than Gly that are in non-preferred conformations, and make corrections if necessary.
    • Investigate all “bad” results in the residue properties in detail. Residues highlighted in red in the bar graph should be investigated in detail, and corrections made if necessary

Using a validation server

PROCHECK, SFCHECK, and other validation programs can be run on web servers instead of CCP4i. The UCLA Institute for Genomics and Proteomics maintains one such server.

  • Upload the appropriate PDB file to the server and run both Procheck and What_Check.
  • PROCHECK provides the same information as the CCP4i module described in the previous section. Check the output as described previously.
  • WHAT_CHECK runs a subset of the WHATIF validation suite. Investigate all “bad” results in detail. In particular, examine “bumps” (steric crowding) to see if they are real or simply the result of large b-factors.

Using Coot

Coot itself has an extensive collection of built-in validation tools that are similar to PROCHECK, but have the great advantage of being interactive. You can generally click on any item of interest in the validation tools and be taken to that portion of the model and electron density for inspection. The validation tools can be used whenever you have a molecule loaded into the program. If you also have the latest electron density maps loaded, you can investigate and correct problems, if necessary, immediately. All validation tools are found under the Validation menu of Coot. A brief description of some of these tools follows:

Ramachandran plot

This tool displays a color coded Ramachandran plot of your protein. Allowed regions are magenta, generously allowed regions are yellow, and disallowed regions are gray. Glycine residues (which can frequently appear in less favorable regions of Φ-Ψ space) are denoted by triangles. All other residues are denoted by squares. Click on any symbol to go to that residue in the model/electron density map. Investigate any residues in non-allowed regions, and all non-glycine residues in generously allowed regions, and make corrections if necessary.

Incorrect chiral volumes

This tool will detect residues that have the wrong stereochemistry, that is amino acids that do not have the L-configuration. You should not encounter any errors during this check. All errors must be corrected. Occasionally, an L-amino acid is accidentally converted to a "D"-amino acid during rebuilding, and these errors should be fixed.

Check/delete waters

This tool will look for water molecules that do not meet reasonable criteria for electron density and hydrogen bonding distances. Investigate all questionable waters as candidates for deletion.

Geometry analysis

This tool will look for anomalies in bond lengths, bond angles, and planarity of residues in your model. Residues with acceptable geometries are denoted in green. Residues with significant deviations from standard geometry are flagged in orange or red. Investigate all anomalous residues and make corrections if necessary.

Rotamer analysis

This tool will look for unusual side-chain rotamers. Investigate all flagged residues and make corrections if justified. This tool will frequently identify easily missed Leu and Val residues that are flipped the "wrong" way.

Peptide omega analysis

This tool will look for non-planar or cis-peptide bonds. Cis-peptide bonds are rare in proteins, and unless electron density very clearly suggests otherwise, you should ensure that all peptide bonds in your model are trans.

GLN and ASN B-factor outliers

This tool will look for Gln and Asn residues that may need to be flipped 180 degrees to better account for electron density. Inspect these outliers, taking note of whether or not the alternate conformation makes more sense with hydrogen bonding partners. IN general, it is a good idea to check all Gln and Asn residues for correct orientation and sensible polar contacts.

Deposition

Once structural problems have been resolved, run one cycle of Refmac without ARP_WATERS. Then repeat structural validation. When all fixable structural anomalies have been addressed, you are ready to submit the structure to the Protein Data Bank.