X-ray Diffraction Data Processing
Roger S. Rowlett
Gordon & Dorothy Kline Professor, Emeritus
Colgate University Department of Chemistry
Gordon & Dorothy Kline Professor, Emeritus
Colgate University Department of Chemistry
There are many software suites that can be used to analyze protein X-ray diffraction data. For data collected on an Oxford Diffraction system, integration and scaling in CrysalisPro is recommended. For data collected elsewhere, e.g., at a synchrotron, instructions for integration and scaling using the programs MOSFLM and SCALA are described.
Normally, CrysalisPro will process data during data collection. If you are satisfied with the experiment as it was originally set up, then it is only necessary to import the output .hkl file into the CNS workflow or the output .mtz file into the CCP4 workflow. Note: before the .mtz file can be used in CCP4, it will be necessary to merge the reflection file and convert intensities to structure factor values as described below.
For major in CrysalisPro data processing done during data collection you can and should completely reprocess (integrate and scale) the data from scratch. Reasons you might want to do this include disregarding certain ranges of frames, altering the resolution limits, or manually assigning the proper space group. The following instructions describe a typical reprocessing task:
The following instructions describe a typical reprocessing task. If the data has been previously processed to your satisfaction, you can simply load the processed data as described above and begin the finalization process.
To process data from CrysalisPro in CCP4, it is necessary to sort reflections by h, k, l and merge the reflection data without applying scale factors. (The data has already been scaled in CrysalisPro.) This can be easily accomplished on one go in the CCP4i GUI using the program Aimless. Start CCP4i and add your project to the project directory list, if necessary. An alternate method is to use the programs sortmtz and scala, but scala has been deprecated in the lastest CCP4 release, so the latter method is not recommended.
Sorting
Note: If using aimless to process data, sorting separately in by sortmtz is unnecessary. Sorting is required if you process data in scala
Merging
There are two ways of completing this task, one using scala, and one using aimless. Aimless is strongly preferred, as scala is no longer updated:
Using SCALA
Using AIMLESS
This is the preferred method for merging data, and does not required a sorted file for input. You may input your un-merged mtz file from CrysalisPro into aimless.
Enter appropriate names for the Crystal name, Project name, and Dataset name. Typically the project name would be the protein (e.g., HICA), the crystal name is its variant and identifier (e.g., D44N-001), and the dataset name describes the origin of the data (e.g., all )
MOSFLM is a program for integrating single crystal diffraction data from area detectors, maintained by Harry Powell, Medical Research Council Laboratory of Molecular Biology, Cambridge. The most convenient way to run MOSFLM is through IMOSFLM in the CCP4 GUI. IMOSFLM provide data suitable for scaling in SCALA.
Startup and configuration
Figure 2. IMOSFLM task window.
Figure 3. IMOSFLM display window.
Indexing the first frame
Cell refinement
Integration
Scaling reflection data from MOSFLM
If integration has gone well, you can proceed to scaling data using Aimless. The most convenient way to use Amiless is through the CCP4i interface. In general the Aimless default settings are very good, and scaling of data is quite transparent. The following procedure is typical for scaling a single data set. For merging and scaling multiple datasets, see the next section.
Figure 4. Main task window for CCP4i. Tasks are listed in the left pane, jobs in the middle pane, and administration functions in the right pane.
Figure 5. The Aimless task window in CCP4i. Mandatory fields are highlighted in color.
Frequently in protein X-ray crystallography it is necessary to combine several datasets in order to solve a structure. Such situations might include:
Aimless can be used to scale and merge datasets in one go. (This is the preferred approach). However, it is also possible to sort and merge data manually, and scale using SCALA if desired.
Using SCALA
To merge datasets, the second and subsequent datasets must be renumbered so that batches of reflections (collections of reflections from a frame of data) will have unique, non-conflicting batch numbers. The resulting sorted datasets are then combined and sorted by reflection, and then finally re-scaled to render them consistent with each other.
Sorting and merging intensity data
Figure 6. Sort/Modify/Combine MTZ files task window. Mandatory fields are highlighted in color.
Scaling merged intensity data
Using Aimless
This is the preferred method for merging data, and does not required a sorted file for input. You may input your un-merged mtz file from CrysalisPro or MOSFLM into aimless.
Figure 7. Aimless task window for scaling and merging multiple data sets.
Sometimes a space group is generally known but the exact space group including screw axes is not immediately known, and the data set must be re-indexed to conform to standard conventions later. For example, you may know that a particular crystal is in the primitive orthorhombic space group (e.g., P222, P212121, P21212, P21221, P22121, P2221, P2212, P2122). Of these space groups, only P222, P212121, P21212, and P2221 are recognized as standard space groups. The others are non-standard variants in which the h, k, l indices have been permuted. To convert one of these non-standard space groups into a standard one, the reflection data indices must be appropriately swapped. For example to convert reflection data from P22121 to the standard P21212, it is necessary to rearrange the indices hkl into klh. This is conveniently done in CCP4i:
Figure 8. Reindex Reflections task window. Mandatory fields are highlighted in color.