Phase Solution
Roger S. Rowlett
Gordon & Dorothy Kline Professor, Emeritus
Colgate University Department of Chemistry
Gordon & Dorothy Kline Professor, Emeritus
Colgate University Department of Chemistry
The most difficult problem in this modeling process is obtaining information about the phase of the observed reflections. (The intensities are accurately measured in your experimental data set.) In order to produce accurate electron density maps, it is essential to have both accurate intensity and phase information. Approximate phases can be obtained by collecting additional data on heavy atom derivatives of the same protein (multiple isomorphous replacement), by examining anomalous scattering of endogenous heavy atoms in the protein (useful for certain metalloenzymes or selenomethionine-substituted proteins), or by using a starting model derived from a homologous protein (molecular replacement).
The simplest method of obtaining phase estimates for X-ray diffraction data analysis is molecular replacement, which involves building a provisional model of the target protein based on the structure of a highly homologous protein, and placing it in the appropriate orientation in the unit cell. The initial phases are calculated based on the positions of all the atoms in the molecular replacement model, and such phases are often sufficient to obtain a usable electron density map that can be used to refine the structure of the target protein. Two excellent tools for solving structures by molecular replacement are EPMR and Phaser, both of which are detailed here. Phaser is the simplest to use, and is the preferred method as it is both fast and efficient. EPMR may be a good alternative in situations where many search models must be placed in the asymmetric unit, and Phaser is unable to find a solution.
For any molecular replacement solution, it is necessary to construct a reasonable molecular replacement search model. Select a molecular replacement protein that is as homologous as possible to the target protein, and examine a sequence alignment of the two proteins. A molecular solution replacement may be possible if the proteins are more than 30% identical. The molecular replacement protein should be modified as follows to make it a similar as possible to the target protein:
The necessary modifications can be easily made using Coot or (even easier) using CHAINSAW in CCP4. For search models that have extra loops or deletions compared to the target protein, the Phyre server is an excellent way to build a reasonable search model. Upload a target sequence, and Phyre will return an ensemble model from the best available sequence matches in the PDB. Phyre will return a monomer, so you may have to create oligomer search models by aligning this chain with various chains in an oligomeric version of your best search model in Pymol or some other program, and combining these orientations in a text file.
Note: for solving the structure of mutant proteins, the ideal search model is an existing solved structure of the wild-type protein. No modifications need be made to the residues of the molecular replacement model in this case.
For the purpose of generating an initial electron density map it is probably wise to remove all cofactors (e.g., coenzymes, metal ions), bound species (e.g., buffers, solvents, ions), and solvent.
Before an electron density map can be generated, it is necessary to place the search model (molecular replacement protein) in the appropriate location of the unit cell. There are a number of programs capable of doing this, but among the best is EPMR, the instructions for which are described here.
The first task is to convert the structure factor file from MTZ format to a format readable by EPMR. This task can be accomplished in the CCP4i environment by choosing the task Convert from MTZ in the Reflection Data Utilities menu. The CCP4i task window for carrying out these actions shown in Figure 2. You should probably exclude reflections that are marked for Free R calculation.
Figure 2. Convert from MTZ task window. Required fields are highlighted in color. Data fields in MTZ file that are to be converted to user-defined format are listed in the MTZ File Labels section.
EPMR also requires an additional file that contains information about the unit cell dimensions and the space group number. This file should contain a single line in the format in which the values of a, b, c, α, β, γ, and the International Tables space group number are entered separated by spaces. The unit cell parameters and space group number can be found in the log file of truncate. Give the file the .cel extension. File 5 is an example for a C2 crystal (space group #5):
File 5
epmr .cel file
232.66 144.73 52.41 90 93.96 90 5
EPMR uses an efficient evolutionary search algorithm to find one of many good fits of the search model to the reflection data during each trial. The search is repeated for many trials, starting with different initial orientations of the search model. The results of the best of these trials is assumed to be (and often is) close to global best fit, providing a good model for estimating phase data and constructing the first electron density map. The program is customizable by including various switches in the command line, some of which are outlined below:
The general format for invoking the program is:
epmr –o filestem filename.cel filename.pdb filename.epmr
where filestem is the stem of the output PDB filename, filename.cel is the unit cell information file, filename.pdb is the molecular replacement search model in PDB format, and filename.epmr is the reflection list file in EPMR format. The command line, which can be quite long, is best put into an executable Linux script file named epmr.sh, an example of which is shown in File 6. The command can be invoked to run in the background by typing epmr.sh & at the prompt.
File 6
A typical EPMR executable file
epmr –m3 –t1.0 –o 3dimer hica08.cel dimer.pdb hica08.epmr > 3dimer.log
The script in File 6 will do an exhaustive search (correlation coefficient of 1.0) to place 3 molecules of dimer.pdb in the unit cell described by hica08.cel, using hica08.epmrreflection data. The best fits for the three placed dimers will be written out as 3dimer.1.best.pdb, 3dimer.2.best.pdb, and 3dimer.3.best.pdb. The real-time output of the program will be sent to the file 3dimer.log, which can be monitored by using the tail –f command. EPMR, even as efficient as it is, will take a substantial amount of time to find a molecular replacement solution for a large unit cell, especially if multiple molecules must be placed.
A decent molecular replacement solution will have an R-factor no larger than ≈0.45. If R>0.50 it is unlikely that the molecular replacement solution will be useful. If the R-factor is satisfactory, then the packing of molecules placed in the unit cell by EPMR should be examined by loading the file into Pymol or Coot and enabling display of symmetry mates. If there are no obvious clashes between symmetry mates, and the symmetry-generated molecules pack well into the unit cell with clear solvent channels and no gaps between molecules, you should proceed, else you should re-evaluate your molecular replacement solution and perhaps try again using different conditions.
If you have placed several molecules of a search model into the unit cell, they should be consolidated and reformatted before proceeding. First, the files should be concatenated using a text editor; any remark files can be removed. Next, the file should be reformatted so that each protein chain has a different SEGID. This can be done in any text editor.
Phaser is probably the most popular (and powerful) molecular replacement program. Phaser is most conveniently run via CCP4i, and one feature of Phaser can be used to estimate the number of protein molecules present in the asymmetric unit prior to running either Phaser or EPMR.
A utility within Phaser can utilize Matthews Probability calculations to estimate the most likely number of protein molecules within the asymmetric unit of the unit cell. This task can be carried out in the CCP4i interface by the following steps:
Figure 3. Phaser task window set up for Matthews probability estimation. Required fields are highlighted in color.
Example sequence file for a dimer in FASTA format
>2A8D:A
MDKIKQLFANNYSWAQRMKEENSTYFKELADHQTPHYLWIGCSDSRVPAEKLTNLEPGELFVHRNVANQVIHTDFNCLSV
VQYAVDVLKIEHIIICGHTNCGGIHAAMADKDLGLINNWLLHIRDIWFKHGHLLGKLSPEKRADMLTKINVAEQVYNLGR
TSIVKSAWERGQKLSLHGWVYDVNDGFLVDQGVMATSRETLEISYRNAIARLSILDEENILKKDHLENT
>2A8D:B
MDKIKQLFANNYSWAQRMKEENSTYFKELADHQTPHYLWIGCSDSRVPAEKLTNLEPGELFVHRNVANQVIHTDFNCLSV
VQYAVDVLKIEHIIICGHTNCGGIHAAMADKDLGLINNWLLHIRDIWFKHGHLLGKLSPEKRADMLTKINVAEQVYNLGR
TSIVKSAWERGQKLSLHGWVYDVNDGFLVDQGVMATSRETLEISYRNAIARLSILDEENILKKDHLENT
Phaser is a fast, highly automated program for finding molecular replacement solutions for multiple protein molecules (search models) in an asymmetric unit. Phaser is conveniently run in the CCP4i environment.
Figure 4. Phaser task window set up for molecular replacement solution. Required fields are highlighted in color.
Molecular replacement is relatively routine when there is a high degree of sequence and structure homology between search model and target protein. (Typical requirements for a successful search are > 30% identity in sequence, and less than 2 Å rms difference in atomic positions.) For difficult cases near the limits of sequence identity or rms difference in atomic positions, a simple Phaser or EPMR search is very unlikely to yield an interpretable electron density map. One possible approach in these borderline cases is a combination of molecular replacement search (Phaser or EPMR), density modification (PARROT), followed by auto-tracing (BUCCANEER). For some cases, this approach, which borrows methodology from experimental phasing, works remarkably well.
The initial MR solution may pack well in the unit cell, and be approximately correctly positioned, but the mean phase error of the resulting electron density map may be quite high. Density modification, especially if non-crystallographic symmetry is available, may significantly improve phases to the point that maps are interpretable. PARROT, which is part of the CCP4 suite, is an excellent option for accomplishing this task. The following steps are typical (see Figure 15 for a PARROT task window):
Figure 15. PARROT task window
The improved electron density map can be inspected in coot, along with the original MR model. In Coot,
It may be possible to rebuild the original MR search model into the improved density from PARROT, but more than likely, there will be many sequence registration errors and/or ambiguities of sequence alignment to the electron density based on a poly-Ala model. It is likely that a better result can be obtained in less time by autobuilding as much of the protein chain as possible. The CCP4 program BUCCANEER is one effective option. The following steps are typical (see Figure 16 for a BUCCANEER task window):
Figure 6. BUCCANEER task window
The output solution can be inspected in Coot. The output .pdb and .mtz files are found in a subdirectory of the CCP4 project directory labeled "39_buccaneer_pipeline..." where the initial number is the CCP4 job number. The files are named refine.pdb and refine.mtz.
It is important to verify that your molecular replacement solution is sensible. Examining crystal packing can provide important information, including:
Crystal packing can be conveniently examined using Coot:
If the space group and molecular replacement solution are correct,