Isomorphous Replacement

CODGAS can be used to generate "native" and "derivative" datasets from a pool of datasets (even from a single soaking exeriment). See: Foos, N., Rizk, M. & Nanao, M.H. (2022). Acta Cryst. D78, https://doi.org/10.1107/S2059798322003977.

In many cases, you san run CODGAS as you usually would, and set up SIR(AS) phasing between the groups that CODGAS has identified. In some cases, a modified target function should be used in which the differences are explicitly optimised.

Usage:

  1. Install the required libraries as described in the SIR CODGAS Installation section

  2. either make your own weeded_files.txt file or let CODGAS make it for you, as described in Quick start

  3. Run CODGAS as usual. You can run with explicit isomorphous differences or without.

    1. Without: just run CODGAS with the usual paramaters, but change "best_or_total" to "total": --best_or_total total. It is worth trying 3 or 4 groups with --groups 3 or --groups 4

    2. With: --sir_iso_weight 100 --best_or_total total --no_file_delete

      1. The last option keeps temporary directories (an unfortunate requirement of the way I coded it) during the optimisation. You can delete them all afterwards. they all start with "tmp". They are not automatically deleted because it can be useful to see the progress of the algorithm in a post analysis.

  4. Run the SIR(AS) between the best_solution_X_group_Y sub directories. CODGAS does not "know" which dataset is native and which is derivative. You must therefore setup SIR between each pair of CODGAS groups. To make this easier, a script has been provided that works with SLURM. You can run it from the CODGAS top level directory:

    1. codgas_intermediate_shelx_slurm.pl --find 4 --solvent 0.4 --shelxd_trys 20000

      1. Other useful options include e.g.:

        1. --element Gd

        2. --highres 1.3

        3. --build 40 # number of SHELXE autobuild cycles

    2. you can then look for successful runs by for example:

    3. find ./ -name "*pdb" -exec grep -Hi CC {} \;

    4. Test data are can be here: lysozyme gadolinium data

      1. Download the XDS_ASCII files

      2. Run CODGAS

      3. python -mscoop -n20 --host localhost -- CODGAS_SIR.py --explicit_random_seed 1619123304 --groups 3 -l96 -n25 -g50 -j1.6 --best_or_total total --output_resolution 1.6

        1. note the random seed is set here because GA initialisation is random!

      4. codgas_intermediate_shelx_slurm.pl --find 1 --solvent 0.4 --element Gd --highres 1 --shelxd_tr 30000 --build 40

      5. Find the successes:

find ./ -name "*pdb" -exec grep -Hi CC {} \;

./best_solution_1_group_2/shelx_23_i.pdb:TITLE shelx_23_i.pdb Cycle 31 CC = 7.60% 25 residues in 3 chains

./best_solution_1_group_2/shelx_21_i.pdb:TITLE shelx_21_i.pdb Cycle 5 CC = 5.83% 21 residues in 3 chains

./best_solution_1_group_2/shelx_21.pdb:TITLE shelx_21.pdb Cycle 40 CC = 8.01% 31 residues in 4 chains

./best_solution_1_group_1/shelx_12.pdb:TITLE shelx_12.pdb Cycle 37 CC = 9.65% 36 residues in 2 chains

./best_solution_1_group_1/shelx_13_i.pdb:TITLE shelx_13_i.pdb Cycle 19 CC = 6.14% 21 residues in 3 chains

./best_solution_1_group_1/shelx_12_i.pdb:TITLE shelx_12_i.pdb Cycle 14 CC = 4.24% 22 residues in 3 chains

./best_solution_1_group_3/shelx_31.pdb:TITLE shelx_31.pdb Cycle 14 CC = 35.49% 113 residues in 5 chains

./best_solution_1_group_3/shelx_32.pdb:TITLE shelx_32.pdb Cycle 26 CC = 35.86% 117 residues in 3 chains



Notes:


SHELXC is used to calculate the ISO term. In some cases, the automatic re-indexing in SHELXC can produce spurious numbers for Riso etc. If you find shelxc log files with this line:

SIR data re-indexed to improve CC against NAT from NaN% to NaN%

You know you have had the problem, and the ISO term cannot be optimised (CODGAS will still run though).

The only known solution to this problem is to use lower resolution data. This can be done by cutting all data, or by for example cutting resolution of just the derivative, e.g.

--sir_derivative_resolution 1.6

Note that you can output higher resolution data at the end of the CODGAS optimisation with:

--output_resolution 1.4