MCCE advanced tutorial

File flow chart of MCCE program

Files required and written out by the program are illustrated in this chart. Step 1 is file formatting, step 2 is rotamer making, step 3 calculates energy look-up table, and step 4 is Monte Carlo sampling of conformers at each pH or Eh. MCCE program can run any steps providing the prerequisite files exist in the working directory. This file flow chart shows the summary of file dependencies.

Five steps of the mcce program

There are 4 major steps in a mcce calculation. These 4 steps are connected by a few files. The program is designed to run through without stopping although you can stop the program at each step and edit the files to instruct the next step. Here is the summary of the function of these 4 steps.

Program start, initialization

Input files:

Output files: None

The initialization reads in control file "run.prm" and loads parameters. It is the single most important configuration and is loaded every time mcce is invoked regardless of the steps the program will do.

The control file "run.prm" must be in the working directory where mcce program is called. The lines of this file are interpreted by the mcce program as key-value pairs. The string in parentheses is the key and the first string of the same line is the value. For example the line

/usr/local/mcce-1.0                       (MCCE_HOME)

can be viewed as $(MCCE_HOME) = "/usr/local/mcce-1.0". Other lines are interpreted in the same way. The resulting key-value pairs define the mcce environment variables, input and output file names, and flags etc.

The mcce program will load parameter files from the parameter directory defined in "run.prm". The way of defining this directory is composed by two variables: $(MCCE_HOME) + "/param" + $(EPSILON_PROT). Thus for these lines in the run.prm file:

/usr/local/mcce-1.0                           (MCCE_HOME)

8.0 Protein dielectric constant for DelPhi    (EPSILON_PROT)

the path name is "/usr/local/mcce-1.0/param08", in which the dielectric constant is converted to a 2-digit integer. In this directory, mcce program will read in files with extension ".tpl". Files whose names do not end up with ".tpl" will be ignored. This makes it possible to store backup files in the same directory.

The file "extra.tpl" is defined by the variable $(EXTRA) in "run.prm". This file is optional. If the file exists, mcce reads it in with the same way as the files in the parameter directory. This file has the same format as a regular parameter file. The function of this file is

Unlike the parameter directory, this file is intended be customized for individual proteins.

The file "new.tpl" is a parameter file similar to those in the parameter directory except no charges are assigned to any atom. This file is for unrecognized residues or cofactors. In step 1, mcce program treats these cofactors as non-charged atom assemblies and writes out this file for two purposes:

There is no output file from the program initialization, but the initilization creates a parameter database to hold information from the parameter files and dynamically generated parameters by the program. A temporary image file of the database is found in the working directory as "~temp.dbm". While mcce is running, you can not delete this file or start another mcce job in the same directory.

Checklist: modifications to run.prm to remember

Step 1, Formatting pdb file

Input files:

Output files:

Step 1 prepares an extended pdb file suitable to be read into step 2. The input pdb file is in standard pdb format.  It can have alternative side chain positions, but mcce can not process alternative backbone positions. Alternative side chains are treated as side chain conformers. When side chain atoms are missing, mcce will complete the side chain atoms at the torsion minimum.  In this step several things will happen:

The renaming rule file "name.txt" instructs the mcce program to rename atom name, residue name, sequce number, and chain ID. Here is several sample lines in this file:

# Symbol "*" in the first string is a wildcard that matchs any character.

# It means "do not replace" in the second string.

#

# The replace is accumulative in the order of appearing in this file.#

*****HEA******  *****HEM******

*****HEC******  *****HEM******

*CAA*HEM******  *****PAA******      extract PAA from heme

*CBA*HEM******  *****PAA******

*CGA*HEM******  *****PAA******

#   ATOM     70  CB  ASP A  12

The line started with "#" and the line shorter than 30 characters are comment lines. For other lines, the first 30 characters should be two 14-character strings separated by exactly two spaces, and the rest of the line is comment field. A valid line instructs mcce program to replace string 1 by string 2. The mcce program will match this string with position 13 to 26 of a coordinate line in the input pdb file. The symbol "*" is the wild card that matches any character in strings. The replace action is accumulative and order sensitive. For example, The line

HETATM 1683  CAA HEC     1       1.317  -3.987  -1.685  1.00  0.00           C

will be renamed to

HETATM 1683  CAA HEM     1       1.317  -3.987  -1.685  1.00  0.00           C

then

HETATM 1683  CAA PAA     1       1.317  -3.987  -1.685  1.00  0.00           C

Another file step 1 may use is "list_rot.gold". This file defines "hot spots" in a protein where rotamers will be made with a step of 30 degrees in step 2. If a coordinate line of any atom of a residue is present in this file, this residue and residues within 4 angstroms will be flagged to be "hot spots" in file "head1.lst", which will be used by step 2.

The output files of step 1 "acc.res" and "acc.atm" contain the solvent accessible surfaces of residues and atoms. In "acc.res", both absolute value and percentage of the solvent accessibility are listed.

When unrecognized cofactor is encountered in step 1, a parameter file will be created with name as "new.tpl" and a warning message will be issued. The atom connectivity is guessed and all atoms are assumed to have charge 0. This file can be the starting point of making a parameter for a new cofactor. If the program is resumed from step 2 or 3, this file will be read in by step 0 as a parameter file so step 2 and 3 can proceed without step 1.

The file "head1.lst" lists the rotamer making policy of the residues. When $(ROT_SPECIF) in "run.prm" is set to "t", this file will be used as instruction of rotamer making of step 2, otherwise, file "head1.lst" will be ignored. This file can be modified and will be effective if the program resumes from step 2.

The file "step1_out.pdb" is a formatted pdb file, which will be read in by step 2. The mcce extended pdb format contains 3 more fields than standard pdb file. Right after atom coordinates, these three fields are charge, size and rotamer making history.

Step 2, Making rotamers

Input files:

Output files:

Step 2 makes and optimizes rotamers from the structure in "step1_out.pdb". In this step, the rotatable bonds (defined in parameter files) of each residue is rotated by the steps defined in "run.prm". Then the self van der Waals (VDW) potential (interaction among atoms of the same side chain excluding 1-2 and 1-3 interactions, and interaction between the side chain and backbone atoms) is calculated. Side chain rotamers with high self VDW potentials are deleted. Then the side chains are optimized with possible hydrogen bond partners. A number of repackings starting from randomized initial structures (one conformer from one residue) are performed to reduce side chain rotamers to those with low energy local packings. Ionization states are then created and protons are placed on side chains. At the end of side chain rotamer optimization, MD simulations are carried out locally to relax the structure.

The input file step1_out.pdb is the only essential file step 2 will use. You can modify this file to add, delete or edit residues without causing problems as long as the file observes mcce extended pdb format. Sometimes a little editing of this file is necessary, for example, the terminal residues are not always correctly identified by the mcce program due to "broken chains" caused by the missing residues in the pdb file.

The file "head1.lst" provides residue specific rotamer making rule. It will be used only when $(ROT_SPECIF) is set to be "t". The line of this file such as:

NTR A0003_ R t 06 S f  0.0 H t 06 M 000

is interpreted as "Rotate is true and 6 steps per bond, then swing is false (angle is 0 if any), then Hydroxyl relaxation is true and the maximum number of starting conformers per hydroxyl is 6, and the maximum total number of the conformers is not limited". If you want to investigate a specific site in details, change the step 6 to 12. But making 12 steps for many sites (>30) are not recommended because it is expensive in terms of memory and CPU time.

The file "progress.log" is a dynamically updated progress report file. It lists mainly the repacking progress.

The file "rot_stat" is an important file to review the rotamer making history. This file lists the number of rotamers of each residue. It is easy to tell what residues get rotamers and if the total number of rotamers is manageable (a 2000 conformer structure may need one day to run the next two steps, step 3 and 4).

The file hvrot.pdb is a file at the same format as step1_out.pdb and step2_out.pdb but lacks hydrogen atoms. The main use of this file is to rename it to "step1_out.pdb" and run step 2 again with "swing" rotamers instead of "rotate" rotamers. This provides a way to relax the structure by "swinging" the rotatable bond a little and reevaluate the rotamers. This feature is most for advanced users to calibrate hydrogen bond directed rotamer making algorithm and MD relaxation subroutine.

The file "head2.lst" is a summary of the rotamers made in step 2. This file is not used by step 3.

The file "step2_out.pdb" is in the mcce extended pdb format. It is the single file connecting step 2 and step 3.

Step 3, Calculating the energy lookup table

Input files:

Output files:

Step 3 calls PB equation solver, DelPhi, to calculate reaction field energy and electrostatic pairwise interaction. The result is stored as together with van der Waals interactions as one file per conformer. These files have extension "opp" and are located under directory energies. The self-energy terms (not dependent on side chains of other residues) of conformers are listed in file "head3.lst" The progress is dynamically updated is file "progress.log".

The file "progress.log" reports the progress of DelPhi calculation, which can be used to estimate the total time of this step. There will be two parts of DelPhi calculations: the first is pairwise calculation and the second part is the reaction field energy calculation. It takes less time on reaction field energy calculation than on pairwise calculation.

The directory "energies" holds the pairwise conformer interaction lookup table as files with extension '.opp' and starting with the conformer_id, e.g. GLU02B0012_002.opp.

These files are header-less, but following is each column description:

          column #:         1           2          3            4                   5                 6

          description:    conf#   name  corr_el   vdw_pwise    delphi_el    post_bdry_corr_el    (kcal/mol)

The file "head3.lst" contains self energy of each conformer and control flags of step 4. The flag is: "t" for fixed occupancy 0 or 1, or "f" for free to sample. The energy unit is Kcal/mol.

The file "step3_out.pdb" is an extended pdb file with multiple conformers. The conformer number is sorted to be continuous and consistent with the conformer numbers in file "head3.lst" and step 4 output file fort.38. This file is identical to "step2_out.pdb" if "step2_out.pdb" is an unmodified file created by step 2.

Step 4, Simulating pH or Eh titration with Monte Carlo sampling

Input files:

Output files:

Step 4 is a titration simulation by Monte Carlo sampling. The Monte Carlo sampling is performed at specified set of pH/Eh. At each titration point, there will be several (predefined in "run.prm", the default is 6) independent samplings. Each sampling goes through annealing, reducing, and equilibration stages. Statistics of conformer occupancy is only done at equilibration statge. Yifan's Monte Carlo subroutine will check early convergence and quit sampling early to save time. The result is reported as conformer occupancy in file "fort.38".

The file "mc_out" is the progress report of Monte Carlo sampling. It contains running energy tracing which can be used to calculate the average E or enthopy of the system, or verify if the system is trapped at local energy minima. By "grep Sg mc_out", you can find the standard deviation of independent samplings.

Step 5, Formatting output files

Input files:

Output files:

Step 5 calculates residue net charge in file "sum_crg.out" and fitted pKa/Em values in file "pK.out".  If requested, step 5 can display DelPhi potential and dielectric maps when using the following flags

t        display DelPhi potential map                       (DISPLAY_POTENTIAL_MAP)

t        display backbone only                              (ONLY_BACKBONE)

1        specify titration point                            (COLUMN_NUMBER)

To display the phi map go to:  

This file (PDB ID 1brs) was created with DelPhi 7.0 with a slice view of the electrostatic phi map.

Customizing control file - line by line remarks of "run.prm"

The most modified section

Five entries in this section outline a mcce run. One can specify the input pdb file and steps.

prot.pdb                                                    (INPDB)

This lin specifies the input structural file of step 1.

f        step 1: pre-run, pdb-> mcce pdb                    (DO_PREMCCE)

Step 1 converts regular pdb file to mcce extended pdb format. The flag "t" designates "do this step".

f        step 2: make rotatmers                             (DO_ROTAMERS)

Step 2 makes and optimizes rotamers of the protein. The flag "t" designates "do this step".

f        step 3: do energy calculations                     (DO_ENERGY)

Step 3 prepares energy look-up table for conformers. The flag "t" designates "do this step".

f        step 4: monte carlo sampling                       (DO_MONTE)

Step 4 simulates a pH or Eh titration with Monte Carlo sampling. The flag "t" designates "do this step.

The less modified entries

These are additional important controls of the program.

8.0      Protein dielectric constant for DelPhi             (EPSILON_PROT)

Dielectric constant of protein. usually epsilon 8 is good for small soluble protein while 4 is good for big trans-membrane protein (>200 residues). If one wants to use dielectric constant other than 8 and 4, one has to prepare a parameter directory named "param##" where "##" is the dielectric constant as an integer. In this parameter directory, the "RXN" entries of tpl files should be recalibrated for the new dielectric constant.

/usr/local/mcce-1.0/extra.tpl                               (EXTRA)

This line points to a file that contains entries to overwrite those in the parameter directory and provides extra controls over the program.

/usr/local/mcce-1.0/name.txt MCCE renaming rule.            (RENAME_RULES)

This is the renaming rule file. It is sometimes necessary to divide a big group into separate ionizable groups, or merge ligands and cofactors as one group. This can be done by renaming the atoms and residues instructed by this file.

ph       "ph" for pH titration, "eh" for eh titration       (TITR_TYPE)

The type of titration simulation of step 4, pH or Eh.

0.0      Initial pH                                         (TITR_PH0)

The starting pH of titration simulation, or the pH of an Eh titration.

1.0      pH interval                                        (TITR_PHD)

The increment of pH titration (size of step).

0.0      Initial Eh                                         (TITR_EH0)

The starting Eh, or the Eh of a pH titration.

30.0     Eh interval (in mV)                                (TITR_EHD)

The increment of Eh titration (size of step).

15       Number of titration points                         (TITR_STEPS)

Number of titration points in step 4, Monte Carlo simulation.

Customizing step 1, preformatting pdb file

These entries are specifically for step 1 and do not have effect on other steps.

t        Label terminal residues?                           (TERMINALS)

The program will identify N and C terminal residues based on the geometry if the flag is "t". When some residues are missing in the input pdb file, false terminal residues would be labeled at the "breakage". Sometimes one needs to disable this feature to work with a stand alone amino acid.

0.1      cut off water if % SAS exceeds this number         (H2O_SASCUTOFF)

As surface water and salt can be taken into account by the continuum model, these molecues are recursively stripped off if they are exposed with the SAS exceeding this cutoff threshold.

2.0      distance limit of reporting clashes                (CLASH_DISTANCE)

The mcce program reports atom pairs with abnormally short atom-atom distances. This line specify the distance threshold in angstroms. Distance 2.0 is good to find atoms within bond distance and switching 2.0 to 2.4 will report possible ligand sites.

Customizing step 2, rotamer making

Step 2 is rotamer making, These lines are for the rules of making rotammers.

f        Use control file "head1.lst" for rotamer making    (ROT_SPECIF)

This line is the flag of using head1.lst or not. "t" means to use. The initial rotamer placing is controlled by either the following lines or an input file "head1.lst" created by step 1 based on the same rules and an addition file "list_rot.gold" which defines hot spot of the protein. The file "head1.lst" contains residue specific rotamer making rules. One can edit this file, and resume the program from step 2 with the flag "t" to get more initial rotamers certain some residues.

t        Do swap (stereo isotope)                           (ROT_SWAP)

This is the flag to make stereo-isotope of Threonine residues, in which, the OG1 and CG2 are swapped.

t        Do rotate?                                         (PACK)

When the mcce program optimize side chain rotamers, the initial rotamers is rotated from the native conformer. This flag turns on the "rotate" subroutine to rotate the single bonds by given number of steps.

6        number of rotamers in a bond rotation              (ROTATIONS)

The number of steps of a bond rotation. This entry and $(PACK) defined above is used by the same "rotate" subroutine.

f        Do swing?                                          (SWING)

Another way of rotating single bonds is to rotate a predefined angle rather than given number of steps to complete a cycle. "swing" subroutine will do this type of rotamers if the flag is "t". Each single bond will yield two new rotamers which are clock-wise and counter-clock-wise rotamers. The rotamers made here is based on the rotamers given by "rotate" subroutine. Turing on both $(PACK) and $(SWING) is not recommended because the combination of two subroutines creates too many rotamers and the computation is both CPU and memory demanding.

10.0     phi in degrees of swing                            (PHI_SWING)

This is the rotation angle of each "swing". This entry and $(SWING) defined above is used by the same "swing" subroutine.

1.0      SAS Threshold of deleting rotamers                 (SAS_CUTOFF)

Surface residues don't have rotamers because their electrostatic interactions are largely screened by solvent, and rotamers don't help the calculation very much. The residues whose fractional SAS are over the number defined in this line won't get rotamers. To use a smaller number like "0.3" to delete surface side chain rotamers.

10.0     Cutoff of self vdw in kcal/mol                     (VDW_CUTOFF)

After initial rotamers are placed, those with big self VDW potential (VDW within side chain and VDW with backbone) will be deleted. This number is the threshold of such deleting action. First the minimum VDW will be found in side chain conformers of a residue, and then conformers with high self energy are deleted. Only heavy atoms (non Hydrogen) are considered at this stage.

5000     number of repacks                                  (REPACKS)

While self-energy pruning removes most inaccessible side chain conformers, "repacking" finds which conformers are more accessible than others in context of other residues. Starting from random selections of microstates (one conformer from each residue), the protein is repacked with optimizing residues sequentially. Here is the number of starting microstates.

0.03     occupancy cutoff of repacks                        (REPACK_CUTOFF)

If fractional counts of a conformer in the "repacking" is less than this number, the conformer gets removed.

t        h-bond directed rotamer making                     (HDIRECTED)

This is flag of making Hydrogen bond directed rotamers. With rotamers obtained after self energy pruning, the conformers are checked with the conformers of neighboring residues. When O-O and O-N distances fall into 2.5 to 3.5 Angstroms, the pair of conformers is optimized to have O-O or O-N distance closer to 2.9 Angstroms.

36 Limit number of the H bond conformers                    (HDIRLIMT)

When the number of hydrogen bond directed rotamers are over this number, "better" rotamers with more ideal hydrogen bond distance are elected.

0.2      threshold for two conformers being different       (HDIRDIFF)

At the end of the rotamer making, the RMSD of atoms between two side chain rotamers should exceed this number, otherwise one of them will be deleted. This entry is different from the following $(PRUNE_RMSD) entry. Any two conformers are within the distance defined by this line is treated as similar conformers and the second one would be deleted, while two conformes have to satidfy three threshold values $(PRUNE_RMSD), $(PRUNE_ELE) and $(PRUNE_VDW) to be similar confomers.

0.5      Pruning cutoff of RMSD                             (PRUNE_RMSD)

The conformers are pruned by the geometry and pairwise interaction vector at the end of the step 2. $(PRUNE_RMSD) defines the geometry difference. If the maximum of the atom distance between two conformers is above this number, these two conformers are treated as non-similar conformers. When two conformers are similar conformers as defined by $(PRUNE_RMSD), $(PRUNE_ELE) and $(PRUNE_VDW), the second conformer would be deleted.

1.0      Pruning cutoff of eletrostatic pairwise            (PRUNE_ELE)

The conformers are pruned by the geometry and pairwise interaction vector at the end of the step 2. $(PRUNE_ELE) defines the electrostatic vector difference. If any dimension of the electrostatic pairwise interaction vector is different by this number, these two are treated as non-similar conformers. When two conformers are similar conformers as defined by $(PRUNE_RMSD), $(PRUNE_ELE) and $(PRUNE_VDW), the second conformer would be deleted.

8.0      Pruning cutoff of vdw pairwise                     (PRUNE_VDW)

The conformers are pruned by the geometry and pairwise interaction vector at the end of the step 2. $(PRUNE_VDW) defines the electrostatic vector difference. If any dimension of the van der Waals potential pairwise interaction vector is different by this number, these two are treated as non-similar conformers. When two conformers are similar conformers as defined by $(PRUNE_RMSD), $(PRUNE_ELE) and $(PRUNE_VDW), the second conformer would be deleted.

f        Do relaxation on water                             (RELAX_WAT)

Flag of doing water relaxation. Water molecules are given both translational and rotational degrees of freedom in this relaxation process.

3.2   Distance between water and heavy atom to move water (WATER_RELAX_THR)

If the distance between water (O) and heavy atom is bigger than this number, them move the water away.

Customizing step 3, energy lookup table calculation

Step 3 is using DelPhi to compute reaction field energy and electrostatic interaction. This section is set up of DelPhi program.

f        Use SAS + Coulomb's law for ele interaction         (QUICK_ENERGIES)

Step 3 can optionally do the energy lookup table with emprical analytical equations rather than solving PB equation. Using these equations would be very fast (a few minutes vs. a few hours by DelPhi) but the error of calculated pKa would be as large as 3 pH units. If you just want find interesting spots of a big protein or refine rotamers, then try this. To turn the emprical calculation on, mark the flag as "t".

f        Display extended head3.lst                         (HEAD3_EXTENDED)

If this flag is "t", then display the individual interactions with the backbone and save them into head3_extended.lst.

/usr/local/mcce-1.0/bin/delphi DelPhi executable             (DELPHI_EXE)

The location of the DelPhi executable. This line is configured at the time of installation. The DelPhi program in this location is modified to read only unformatted pdb file.

80.0     Solvent dielectric constant for DelPhi              (EPSILON_SOLV)

The dielectric constant of solvent. Although it is editable here, modifying this entry would void the calculation as all other parameters were calibrated at dielectric constant 80.

65       Grids in each DelPhi                               (GRIDS_DELPHI)

The DelPhi grid size. It must be an odd number. More grid points produce more accurate result but costs much more CPU time. Since mcce program uses focusing technique, a larger number is not necessary.

2.0      The target grids per angstrom for DelPhi           (GRIDS_PER_ANG)

The DelPhi focusing run will end up with this targeted resolution. A minimum value of 1.0 grid per angstrom is recommended.

1.4 Radius of the probe (RADIUS_PROBE)

The size of the probe to evaluate solvent accessible surface (SAS) and defines dielectric boundary. If this value is set to be 0, then the surface detected becomes van der Waals surface.

2.0      Ion radius                                         (IONRAD)

Ion radius is a parameter of Poisson-Boltzmann equation solver, DelPhi.

0.15     Salt                                               (SALT)

Salt concentration of the solvent in unit M (Moles/Liter).

f        Reassign charge and radii before delphi            (REASSIGN)

The mcce program will reassign charges and radii to atoms from the parameter files if this flag is "t". Usually, the charges and radii are read in from extended pdb file "step2_out.pdb", in which these parameters are already assigned but editable.

f        Recalculate torsion energy when write out head3.lst (RECALC_TORS)

If this flag is "t", then recalculate torsion energy for all conformers even when step 3 only caculates a part of conformers.

Customizing step 4, Monte Carlo sampling

In this step, the mcce program first loads self energy of conformers from file "head3.lst" and pairwise interaction from directory "energies", then simulate the titration with Monte Carlo sampling.

f        Average the pairwise, "f" uses the smaller         (AVERAGE_PAIRWISE)

The electrostatic pairwise interaction calculated in two directions are slightly different. We can either average them or choose the smaller (absolute) value. As the difference may be from the inaccurate dielectric boundary, using the smaller value reduces the dielectric boundary error.

20.      Warning Threshold of difference in pairwise        (WARN_PAIRWISE)

When the difference of pairwise interaction in two directions is very different, that is, absolute value bigger than 0.5 kcal/mol and percentage difference is bigger than this number, the program will issue warning message.

5.0 Big pairwise threshold to make big list (BIG_PAIRWISE)

Multiple-flips happen on the big list of a residue, which enables proton transfer from site to site, instead of site to solvent. The big list is the neighbor list with big pairwise interaction. The threshold defined here is used to make such big lists for each residue.

-1       Random seed, -1 uses time as random seed           (MONTE_SEED)

Random seed, -1 to use time as the random seed.

298.15   Temperature                                        (MONTE_T)

The temperature of Monte Carlo simulation. As the reference standard free energy (pKa of acid in solution) is at room temperature, using other temperatures may require modifications on other parameters.

3        Number of flips                                    (MONTE_FLIPS)

A new microstate can be created by flipping conformers of one or multiple residues. The strategy we used is to flip one residue and its neighbors so that concerted motion can be properly sampled. The number here is the maximum of flipping sites.

100      Annealing = n_start * confs                        (MONTE_NSTART)

The steps of annealing. An independent Monte Carlo sapling goes through annealing, reducing, and equilibration stages. The sampling starts from a totally random microstate which might be at "high" energy state, and gradually "cooled" down to equilibrated states. The microstates in the annealing stage would be discarded.

1000     Equilibration = n_eq * confs                       (MONTE_NEQ)

The length (steps) of reducing stage. After reducing stage, those conformers who are not occupied or 100% occupied will be given such occupancy and not be sampled.

0.001    Cut-off occupancy of the reduction                 (MONTE_REDUCE)

Some conformers are never selected in the reducing stage, so they will be flaged and not be sampled any more. The cutoff decision is based on the value defined here, a fractional number of the occupancy. The total occupancy of conformers in a residue is 1 unless all of them are fixed.

The following 4 lines are for basic Monte Carlo sampling.

6        Number of independent monte carlo sampling         (MONTE_RUNS)

The number of independent Monte Carlo samplings at one titration point. The standard deviation of these independent runs will be reported for each conformer in progress file "mc_out" and the biggest standard deviation will be output to stdout.

5000     Sampling = n_iter * confs                          (MONTE_NITER)

The number of steps in sampling at the equilibration stage. The conformer occupancy statistics is collected at this stage.

5000     Trace energy each MONTE_TRACE steps, 0 no trace    (MONTE_TRACE)

The interval of tracing the running microstate energy. Every this number of steps, the mcce program will write out the microstate energy to file "mc_out". You can see how this energy fluctuates and decide if the sampling is trapped in local minima. In addition, the average of this energy is the system energy corresponding to Enthalpy H.

1000000  Maximum microstates for analytical solution        (NSTATE_MAX)

If the microstates are less than this number, calculate the conformer occupancy analytically, otherwise, use Monte Carlo sampling..

The following lines are for advanced Monte Carlo sampling. The advanced Monte Carlo sampling uses less time by detecting early convergence.

f        Using Yifan's monte carlo                          (MONTE_ADV_OPT)

The flag to turn on advanced Monte Carlo sampling.

f        Using format from old version                      (MONTE_OLD_INPUT)

Flag it with "t" to load energy terms from old MCCE program.

5000     Min Sampling = n_iter * confs                      (MONTE_NITER_MIN)

The minumal number of steps of Monte Carlo sampling. No termination before this number of steps.

-1       Max Sampling = n_iter * confs(-1 stop @ converged)  (MONTE_NITER_MAX)

The maximal number of steps of Monte Carlo sampling. -1 means ndefinite sampling until converged.

10000    Number of iterations each cycle                  (MONTE_NITER_CYCLE)

One cycle is a set of Monte Carlo runs without book keeping, convergence checking etc., in order to save time.

1000     niter * n_conf check convergence                   (MONTE_NITER_CHK)

Convergence is checked every x cycle. x = (int)(this number * (number of conformer) / $(MONTE_NITER_CYCLE)) + 1

-1       Number of the reduce steps(-1 stop @ converge)     (MONTE_N_REDUCE)

A reduce step is a separate MC run restarting from a different random microstate. Conformers with occupancy < $(MONTE_REDUCE) is fixed to have occupancy 0, therefore no longer sampled. Convergence will be checked again between different reduce runs. New reduce run is initialized until number of reduce run reaches this number ($(MONTE_N_REDUCE) >0) or convergence criteria is satisfied ($(MONTE_N_REDUCE) <0).

0.01     Threshold for convergence                          (MONTE_CONVERGE)

This is convergence criteria both for single Monte Carlo run (explained in $(MONTE_NITER_MAX)) and for reduce steps (explained in $(MONTE_N_REDUCE)). RMSD is obtained by comparing average occupancy at this checking point to average occupancy at last checking point. When RMSD of all active conformer occupancy is smaller than this number, convergence criteria is satisfied.

f        Calculate free energy                              (MONTE_DO_ENERGY)

If this flag is "t", free energy subroutine is called. Caution, free energy calculation is expensive.

Free energy is calculated by non-linear average of energies from all the unique microstates sampled in MC. Therefore unique microstates are collected. And in every MC step the occupied microstate is checked against all collected microstates to see if it is a new unique one.

298.15   Starting temperature for annealing                (ANNEAL_TEMP_START)

Annealing can be run in a different temperature (usually high) before data collection to bring the system out of local minimum. This parameter gives starting temperature of annealing step. Final temperature is $(MONTE_T).

0        Number of steps of annealing                      (ANNEAL_NSTEP)

Each annealling run changes the system temperature by ($(MONTE_T)-$(ANNEAL_TEMP_START))/$(ANNEAL_NSTEP), to bring the temperature linearly to the MC temperature.

1000     Number of iterations for each annealling step    (ANNEAL_NITER_STEP)

Each annealing step runs x iterations. x = (This number) * (Number of active conformers)

Miscellaneous entries

These lines are for advanced use of the program.

/usr/local/mcce-1.0                                       (MCCE_HOME)

The home directory of mcce. It is generally set up by the installation script. The program will use this path to search the parameter directories.

1         delphi start conformer number, 0 based            (DELPHI_START)

You can control to do which conformers in step 3, energy lookup table calculation. Step 3 is the most expensive calculation in mcce program. Sometimes you may want to split to calculation into pieces to take advantage of computer clusters. This number is the starting number of conformers. The conformer list before step 3 can be found in step2_out.pdb.

99999     delphi end conformer number, self included        (DELPHI_END)

The end conformer number of the step 3. To run all conformers, make sure this number is bigger than the number of all conformers.

f        skip delphi in step3 (DEBUG option)                (SKIP_ELE)

Do step 3 without really calling DelPhi when the flag is "t". This if for debug use.

/tmp     delphi temporary file folder, "/tmp" uses node     (DELPHI_FOLDER)

The directory to store DelPhi temporary files for focusing runs. When running mcce on a computer cluster, it is expensive to read and write files to the master node disk. As /tmp directory is always on compute node's local disk, the temporary files are not actually ransferred to the master node and the disk I/O is shared by nodes. Other times, you may want to check the error of DelPhi runs. Then use "./" to place the DelPhi temporary files to the working directory. DelPhi directory is named as delphi_?????? (?????? is a 6 character unique random string) under the directory defined by this line.

t        clean up delphi focusing directory?                (DELPHI_CLEAN)

The flag "t" will remove the DelPhi directory after step 3.

debug.log                                                   (DEBUG_LOG)

The file name of the debug output file. Renaming this file is not recommended.

new.tpl local parameter file for unrecognized res           (NEWTPL)

The file name of temporary parameter file of the unrecognized cofactor. Once a cofactor parameter file is created and put into the permenant parameter directory, you should remove this file, otherwise the duplicated entries in this file void those in parameter directory.

1.7      defalut van der Waals radius, for SAS              (DEFAULT_RADIUS)

This value will be assigned to any atoms without explicitly defined radii in the parameter files.

0.5      factor to 1-4 LJ potontial (1.0 is full)           (FACTOR_14LJ)

VDW scaling factor for 1-4 interaction as suggested by AMBER.

1.0      dielectric constant for Coulomb's law              (EPSILON_COLUMB)

In step 2, optimization was done with Coulomb's law for the electrostatic interactions. This value is the dielectric constant for Coulomb's law for this purpose.

Tricks of using "run.prm"

Detecting ligand

As the ligand bond distance is around 1.8 to 2.4 angstroms, we can take advantage of the clash report in step 1 of the mcce program. If $(CLASH_DISTANCE) is set to be 2.4, then run only step 1. The atom pairs within this distance will be reported.

Updating head3.lst

If you want to use different parameters such as reference pKa and reference reaction field energy in step 4 Monte Carlo sampling, you may edit step 3 output file "head3.lst". Another way to load new parameters and update "head3.lst" is to run step 3 without doing "Delphi". In another situation, if conformers are split into multiple directories to run the step 3, you need to "merge" them and update "head3.lst". To do this, make $(DELPHI_END) a smaller number than $(DELPHI_START) and run step 3 only. The self energy will be reloaded and updated based on the current parameters.

Analyzing MCCE results

head3.lst

File header format:

iConf:conformer ID

CONFORMER: conformer name

FL: flag| f means the conformer is on, t means the conformer is off.

occ: occupancy

crg: charge

Em0: Em in solution

pKa0: pka in solution

ne: # of electrons

nH: # of protons

vdw0: self vdW energy + implicit vdW energy (favorable) with solvent (water)

vdw1:backbone vdW

tors: torsion energy

epol: backbone electrostatic interaction

dsolv: desolvation energy (delphi output energy energy(s) - mcce .tpl rxn energy)

extra: extra energy term

history: to keep track of the conformer

fort.38

This is the file header appearing in fort.38 when run.prm is setup to do a pH titration with 10 data points starting at pH 0:

    ph|0.0| 1.0| 2.0| 3.0| 4.0| 5.0| 6.0| 7.0| 8.0| 9.0|

The first column in the file is conformer list

From the second column till the last is the occupancy for every conformer at each ph/eh value.

sum_crg.out

Similar to fort.38, the first column is the conformer list. And the rest are the charge for every conformer at each ph/eh/ch value.

energies.opp

It's a compressed file containing the energy files. It can be decompressed by command: zopp -x energies. This will generate an energy directory which contains .opp energy files for each conformer. The .opp file will give you the conformer-conformer interaction energy (kcal/mol) between this conformer and all the other conformers in the protein.

Description summary (.opp files do not have headers):

  column #:    1           2          3            4                       5                          6

description:    conf#   name  corr_el   vdw_pwise    post_bdry_corr_el     delphi_el    

Column 1:  this is the conformer number for example 03109

Column 2:  this is the name; for example, ILE01A0139_001.  ILE is the residue name, 01 is the charge (01, 02, 03 means its neutral, +1 is positive one, -1 is negative one).  A                     is the chain identifier, 0139 is the residue number within the chain, 001 is the conformer number. 

Column 4:  this van der Waals pairwise interactions

Column 5:  this is the corrected pairwise. Multiconformer calculations * (single conformer boundary calculations  /  the first residue of the multiconformer boundary calculations)

Column 6:  this is the multiconformer calculations.  Raw data for electrostatic interactions from Delphi 

Column 3:  this is corrected electrostatic, this column goes into step 4 for further calculations.  The corrected electrostatic is calculated by the following:

corrected electrostatic  = multiconformer calculations * (single conformer boundary calculations  /  the first residue of the multiconformer boundary calculations)

However, if the van der Waals interactions are greater than 50 kcal/mol OR the single conformer boundary calculations and the multiconformer boundary calculations are different in sign, then the corrected electrostatic is equal to 

corrected electrostatic = single conformer boundary calculations / 1.5

corrected electrostatic = multiconformer calculations * (single conformer boundary calculations  /  the first charged conformer of the multiconformer boundary calculations) averaged between conf i j and j i or

However, if the van der Waals interactions are greater than 50 kcal/mol, the corrected electrostatic is equal to 

Moreover, if the single conformer boundary calculations and the multiconformer boundary calculations are different in sign, the corrected electrostatic is equal to

corrected electrostatic = multiconformer calculations/1.5 averaged between conf i j and j i or 

corrected electrostatic = multiconformer calculations averaged between conf i j and j i or

Note:  * mark in the opp files means that this is a single conformer boundary to determine the pairwise interactions of the conformer of interest (Ai) with the initial conformer of each other residue (B1).  This conformer that was used to correct for the multiconformer calculations.  That is 

Note: ? mark in opp files means that the van der Waals interactions are greater than 50 kcal/mol

The following is an example of the opp file output, conformer MNC+3G0604_001.opp

acc.atm/acc.res

These two files generated after step1 tell you the solvent accessible surface on a atom base (acc.atm) or residue base (acc.res). In acc.res, the first column of numbers is the absolute SAS while the second column is the relative value.

head1.lst

This is an example of head1.lst:

NTR _-005_ R t 06 S f 0.0 H t 36 M 000

The first two columns are the residue name and ID.

"R t 06": rotation is on and the # of rotamer per each rotatable bond is 6.

"S f 0.0": Option of conformer swing is off.

"H t 36": hydrogen relaxation is on. The # of hydroxyl positions are 36.

"M 000":