MCCE quick start


Introduction to the run.prm control file

MCCE is a command line program. MCCE takes a run control file and a pdb file, and will write out results as well as temporary files in the directory where the MCCE command is invoked. For more details on program flow, consult the MCCE program page. We recommend that you create separate working directories for each protein and run parameter file.

The run control file assigns values to a series of variables that are passed to MCCE, which tell the program how to run. If a line is commented out by placing a hash character (#) at the beginning, MCCE will ignore that line of the run.prm file and assign a default to the commented-out value. Most users will not need to modify settings beyond the name of the input structure, which steps you wish to run, and potentially extra cofactor topologies or custom titration ranges. Here is an example of the most often customized lines of the control file run.prm:

======================================================================
Most modified entries
----------------------------------------------------------------------
Input and Output:
        name of input structure → prot.pdb                                                       (INPDB) 
Steps:
f step 1: pre-run, pdb->mcce pdb (DO_PREMCCE)
f step 2: make rotamers (DO_ROTAMERS)
f step 3: do energy calculations (DO_ENERGY)
f step 4: monte carlo sampling (DO_MONTE)
======================================================================

QUICK, DEFAULT, and FULL: the three standard run control files

There are 3 sample run.prm files included in the MCCE directory. Each one differs in the amount of conformer generation and pruning it does, which affects the accuracy and computing time of the simulation. For details on the difference in performance among the different presets, consult the supplementary information.

QUICK: Performs the most cursory pKA calculation with a minimal number of conformers.

DEFAULT: A compromise between QUICK and FULL, which runs most of the conformer generation steps and predicts more accurate pKa values.

FULL: Best for exploring systems that have significant side chain motions, the FULL parameters do extensive conformer generation and energy calculation.

Running and monitoring your first MCCE job

Downloading and preparing run files

Here is an example of calculating pKa values for hen egg lysozyme (4LZT). First, make a working directory for the protein. Then, run the getpdb script included in the MCCE package, which will automatically download the PDB file from the RCSB database for you:

[user@domain home]$ mkdir 4lzt
[
user@domain home]$ cd 4lzt
[user@domain 4lzt]$ getpdb 4lzt
Inquiring the remote file 4LZT.pdb ...
Saving as 4LZT.pdb ...
Download completed.
[user@domain 4lzt]$

Configuring run control

Copy run.prm.quick to run.prm in your working directory:

[user@domain 4lzt]$ cp /usr/local/mcce-1.0/run.prm.quick run.prm

You will need to edit this run.prm file. In the editor, set the value of (INPDB) to 4LZT.pdb, and turn flags for all 4 steps to "t". You may want to run one step at a time and look at the output and run.log messages for each step.

The Most modified section will look like this:

======================================================================
Most modified entries
----------------------------------------------------------------------
Input and Output:
4LZT.pdb (INPDB)

Steps:
t step 1: pre-run, pdb->mcce pdb (DO_PREMCCE)
t step 2: make rotamers (DO_ROTAMERS)
t step 3: do energy calculations (DO_ENERGY)
t step 4: monte carlo sampling (DO_MONTE)
======================================================================

Running and monitoring the MCCE job

MCCE can be invoked simply by running the mcce executable. The program will take hours to days to finish, so you will want to run the program in the background and redirect the output message to a log file:

[user@domain 4lzt]$ mcce > run.log&
[1]8533
Serious errors will appear in the runtime messages, which you can check if the job fails. The files debug.log and progress.log provide extra warning messages and progress reports, which can be safely ignored most of the time.

Encountering an unknown residue

While running the 4lzt job, you will get the following message in run.log:

Error! premcce_confname(): Can't get conformer list of this residue NO3

This means MCCE did not recognize the cofactor NO3, because there was no data for it in the parameter directory. In cases like this, MCCE continues while substituting a placeholder for the unknown residue. For the purposes of this tutorial, we will ignore this error. Surface ligands are unlikely to significantly affect pKa error, due to the screening effect of the solvent and the salt effect that is restored by the Poisson-Boltzmann calculations from DelPhi.

However, if you encounter this error in your own system, you may want to stop the program and make a residue parameter file (*.tpl), especially if you are treating a buried ligand or something more elaborate than a lone nitrate group. Multiple conformers may be physiologically relevant to your simulation, so having an accurate topology on hand is crucial. We recommend consulting our database of topologies to see if there is an optimized structure available. If not, you can generate a reasonably good topology using a tool such as AmberTools antechamber. See the parameterization page for a list of recommended software.

Understanding the MCCE workflow

For a detailed explanation of what is happening in each step, see the MCCE Program page. This will be a brief overview to help you get started.

Step 1: In this step, MCCE preprocesses the input PDB structure to optimize it for continuum electrostatics, insert custom groups, and connect topologies to each residue/cofactor. It splits off the N-terminal and C-terminal of the protein as independently titratable NTR and CTR residues, respectively. It then renames groups based on the contents of the name.txt file. After calculating solvent accessibility for all atoms and residues and writing those values out to acc.res and acc.atm, it strips off water and salt molecules in the hydration shell with 10% or less surface accessibility, meaning they are far from the protein, and excludes protein groups with more than 30% solvent accessibility from step 2 conformer making. Undefined cofactors are replaced with dummies made up of non-charged atoms with a standard VdW radius of 1.7 Angstroms. This step produces a temporary parameter file, new.tpl, which can be used as a template for inserting custom parameters for new groups. The file acc.res lists the solvent accessibility of residues, and acc.atm lists the solvent accessibility of atoms.

Step 2: This is the step in which MCCE iteratively generates and prunes rotamers that are then written out as a multi-conformation PDB file, step2_out.pdb. The rotamer making statistics are dynamically updated in file rot_stat.

Step 3: The DelPhi runs in step 3 to generate the energy lookup tables are the most time-consuming step in MCCE.  If you look at the bottom of run.log when DelPhi is running, it will look like this:
[user@domain 4lzt]$ tail run.log

Creating connectivity table...
Done

Preparing DelPhi runs ...
3 focusing runs required for this protein.
Done

Computing pairwise from conformer 1 to 281 of 281 total conformers
see progress.log for progress...

Taking a look at progress.log, we see:

[user@domain 4lzt]$ tail progress.log			
Doing pairwise    72 of   281 conformers.    5 seconds
Doing pairwise    73 of   281 conformers.    2 seconds
Doing pairwise    74 of   281 conformers.    4 seconds
Doing pairwise    75 of   281 conformers.    2 seconds
   Doing pairwise    76 of   281 conformers.    3 seconds
   Doing pairwise    77 of   281 conformers.    3 seconds
   Doing pairwise    78 of   281 conformers.    4 seconds
   Doing pairwise    79 of   281 conformers.    4 seconds
   Doing pairwise    80 of   281 conformers.    5 seconds
   Doing pairwise    81 of   281 conformers.
[user@domain 4lzt]$

You can estimate the time remaining in step 3 by multiplying the average time per conformer by the number of conformers times 2, for one pairwise calculation and one reaction field energy calculation. For this protein, step 3 will take about 4 seconds*281 conformers*2 DelPhi's= 2248 seconds or 37 minutes. The output directory of step 3 is "energies" and the output file is "head3.lst". Both are needed as input for step 4.

Step 4: Step 4 is Monte Carlo sampling, which simulates a pH or Eh titration. It writes the progress to file "mc_out". The occupancy of each conformer at each pH is in fort.38. The net charge of each ionizable residue is in "sum_crg.out". The pKa or Ems are in "pK.out". These are obtained by non-linear fitting of the titration curves in file "sum_crg.out".

Analyzing your results

One reason to run simulations is to be able to ask why. To help you, we've included the script mfe.py, which analyzes the factors contributing to the ionization energy of a residue at a specific pH by approximating the interactions between residues using a Mean Field approach. For example, let's run mfe.py on the free energy of the reaction GLU -> GLU- + H+ for Glu 58 at pH 5.078, the midpoint of the titration curve of this residue run. mfe.py takes three arguments: the residue name from pK.out, the pH at which to analyze the Boltzmann distribution of conformers, and the threshold for free energy values to print out. In this case, we are interested in residue GLU-_0035_, pH 5.078, and specific interactions with individual residues over 0.2 kcal/mol. The result explains the free energy of the reaction GLU -> GLU+ at pH 5.078, the midpoint of the titration curve of this residue, where reactant and product are equally favored.

[jmao@metis 4lzt]$ mfe.py GLU-_0035_ 5.078 0.2
Residue GLU-_0035_ pKa/Em=5.078
=================================
Terms          pH     meV    Kcal
---------------------------------
vdw0         0.03    1.98    0.05
vdw1         0.08    4.91    0.12
tors        -0.00   -0.00   -0.00
ebkb        -1.39  -80.41   -1.89
dsol         2.72  157.67    3.71
offset      -0.99  -57.45   -1.35
pH&pK0      -0.33  -19.04   -0.45
Eh&Em0       0.00    0.00    0.00
-TS          0.11    6.35    0.15
residues    -0.15   -8.47   -0.20
*********************************
TOTAL        0.10    5.54    0.13
*********************************
ASP_0048_    0.22   13.02    0.31
ASP_0052_    0.95   54.90    1.29
ARG_0112_   -0.29  -16.64   -0.39
ARG_0114_   -0.24  -13.93   -0.33
=================================
It is expected here that the ionization energy in line TOTAL is close to 0. The small error is caused by mean field approximation, and this error may indicate some degree of coupled ionization or conformational changes with other residues. The breakdown of ionization energy helps in identifying the factors controlling the pKa of the residue. Negative numbers indicate the ionized state is favored, so ASP52 in this protein shifts up the pKa of GLU35 by 0.95 pH unit.

The table lists other energy terms that describe the difference between the contributions from the designated term to the ionized conformers and neutral conformers:

vdw0 is the van der Waals interaction of the side chain of this residue.

vdw1 is the van der Waals interaction of the side chain of this residue to the backbone atoms.

tors is the torsion energy of this residue.

ebkb is the electrostatic interaction of the side chain to the backbone.

dsolv is the loss of solvation energy.

offset is a correction for each residue type defined in extra.tpl.

pH&pK0 is the pH effect, which equals solution pH minus solution pKa of that residue, if the residue is an acid, or solution pKa of that residue minus solution pH, if the residue is a base.

Eh&Em0 is the redox potential effect, which is the current solution potential minus the standard redox potential of the cofactor.

-TS is the entropy effect; as the protonated and deprotonated residues have have different constraints in a protein, the entropy of the two ionization states could be different.

residues refers to the mean field interaction with residues given their conformations at this pH.  This is the term that is pH or Eh dependent.

A mcce web interface analysis tool is provided separately. If it is installed, just point your browser to "http://localhost/cgi-bin/mcce.py" (replace localhost by the host name of mcce server if you run mcce on a remote computer). For more information, refer to section 4, Understanding MCCE Results.

Advanced run control options

If you need to run a specialized simulation, you can change the following parameters:
======================================================================
Less modified entries
----------------------------------------------------------------------
8.0 Protein dielectric constant for DelPhi (EPSILON_PROT)
/usr/local/mcce-1.0/extra08.tpl (EXTRA)
/usr/local/mcce-1.0/name.txt MCCE renaming rule. (RENAME_RULES)

ph "ph" for pH titration, "eh" for eh titration (TITR_TYPE)
15 Number of titration points (TITR_STEPS)
0.0 Initial pH (TITR_PH0)
1.0 pH interval (TITR_PHD)
0.0 Initial Eh (TITR_EH0)
30.0 Eh interval in mV (TITR_EHD)
======================================================================
/usr/local/mcce-1.0                       (MCCE_HOME)

EPSILON_PROT: This is the dielectric constant of the protein that MCCE passes to the DelPhi solver. Normally, ϵ = 8 is good for proteins of fewer than 200 residues or that are soluble; large and transmembrane proteins do better with ϵ = 4. MCCE currently only supports 4 and 8 as dielectric constant values, but this is user extensible by updating the reference reaction field energy for residues at the desired dielectric.

EXTRA: This points to a file with offsets for each residue type calibrated by comparing about 600 calculated pKas and experimental pKas. It is intended to reduce system error from inaccurate solution pKa of residues, different solvent entropy effects, and different polarization of atomic charges between the reference system water and protein material. The EXTRA topology file can also be used to test new parameter files before adding them to the permanent directory, as MCCE treats it at a higher priority than the topologies in the parameter directory. Leave this line blank if you do not intend to use it.

RENAME_RULES: In some cases, it is important to treat some individual residues differently from the standard type in the rest of the protein. For example, you may want to treat a His ligand to a heme differently than other His, or treat the propionic acids on a heme as separate titratable groups. Alternatively, you may want to group binding site residues with a ligand, or split a ligand into several ionizable groups. In this case, you will want to direct MCCE to a rename file, such as the sample file "name.txt" included in the MCCE distribution directory. You will need to change the atom and residue names and sequence number and create a new tpl file for the special residue types.

TITR_TYPE: Specify whether you want to run a pH titration or a redox potential titration (Eh) during the Monte Carlo sampling step.

TITR_STEPS: Specifies the number of sample points measured during a titration.

TITR_PH0/TITR_PHD: The standard pH titration runs from pH 0 to pH 14 with steps of 1 pH unit. This is customizable by changing the values of TITR_PH0 and TITR_PHD. Make sure that TITR_PH0 + (TITR_PHD * TITR_STEPS) equals the end pH you wish to sample.

TITR_EH0/TITR_EHD: The standard Eh titration runs from 0.0 mV to 450.0 mV with intervals of 30.0 mV. If you change these values, make sure that TITR_EH0 + (TITR_EHD * TITR_STEPS) equals the end redox potential you wish to sample.

MCCE_HOME: defines the location of mcce program and the parameter directories. If you have more than one distribution of mcce programs, you may want to switch between the programs and the parameter directories. For now, the only use of this line is to compose the the parameter directory. For example, if this line points to /usr/local/mcce-1.0/ and (EPSILON_PROT) gives dielectric constant 4, then the parameter directory is /usr/local/mcce-1.0/param04/.

Supplementary information

Comparing run times between different MCCE presets

The table below shows a comparison of runs on hen egg lysozyme with the three run.prm presets. When comparing these numbers with experimental pKa values, bigger errors are often seen for the binding sites and functionally important residues marked in bold. These residues often show abnormal pKa values that are hard to calculate. An error of about 1 pH unit on these sites is typical. (back to top)

Presets of run.prm QUICK
DEFAULT
FULL
Number of conformers 281 1421 1826
Step 1 - check protein 1 second 1 second 1 second
Step 2 - make rotamers 0.05 minutes 10 minutes 19 minutes
Step 3 - generate lookup table 20 minutes 303 minutes 456 minutes
Step 4 - run Monte Carlo 1 minute 47 minutes 66 minutes
Overall RMSD of error 0.75 0.72 1.06
Residue Exp. pKa Cal. pKa Cal. pKa Cal. pKa
LYS+0001 10.8 10 10.02 10.44
GLU-0007 2.85 3.57 3.52 3.33
LYS+0013 10.5 11.44 11.46 10.99
ASP-0018 2.66 2.94 2.74 2.88
TYR-0020 10.3 11.13 10.44 11.08
TYR-0023 9.8 10.14 10.18 10.61
LYS+0033 10.4 10.07 9.97 9.82
GLU-0035 6.2 5.21 5.11 5.33
ASP-0048 1.6 2.56 3.09 3.83
ASP-0052 3.68 2.82 3.68 4.83
ASP-0066 0.9 2.58 2.1 3.08
ASP-0087 2.07 2.46 2.41 3.65
LYS+0096 10.8 11.22 11.55 11.19
LYS+0097 10.3 11.12 11.24 10.43
ASP-0101 4.08 4.26 4.25 5.34
LYS+0116 10.2 9.73 9.71 9.79
ASP-0119 3.2 3.62 3.58 4.3
CTR-0129 2.75 2.44 2.46 3.54
Comments