QC/MM with Orca

Contributed by Ramon Crehuet and Marius Retegan for pDynamo versions 1.7.2+.

Introduction

This tutorial explains how to perform hybrid QC/MM calculations with pDynamo using Orca as the external code that runs the QC calculation (with the electrostatic embedding of the MM charges).

Some of these considerations will also be useful for running QC/MM calculations in pDynamo, and even in other codes. However, this is not a tutorial on QC/MM. Here we assume you are familiar with the theory and the concepts behind QC/MM.

Truncation of the system

Non-bonded interactions with the NBModelOrca do not support any kind of truncation or force-switching so that all interactions are calculated. Together with the fact that QC/MM with Orca cannot be performed with periodic boundary conditions (PBC), truncation of the system is necessary. A reasonable approach is to prune the system and keep a spherical shape.

If you have a system with PBC, care must be taken not to remove the PBC before the truncation, otherwise your sphere will contain as vacuum the region of the sphere that was previously in a neighboring cell. Though you can manually turn off PBC, when you purge a system, PBC are turned off automatically, so it is better not to fiddle with the PBC and let pDynamo do the work. You can prune the system the following way:

# Prune the system to a spherical region

atomIndex = system0.sequence.AtomIndex ( "A:PI.801:P" )

core = AtomSelection ( selection = set ( [ atomIndex ] ), system=system0 )

selected = core.Within ( 23.0 ).ByComponent ( )

unselected = ~ selected

system = PruneByAtom ( system0, Selection(selected.selection))

What is the recommended size of the system? As usual, the answer is as large as possible. Here, the limiting factor will be the calculation of the point-charge potential in Orca. Previously this step was not parallelized and this imposed a stringent restriction on the size of the system, but in later releases, this step can also be run in parallel. If your QC region is very large or your QC method is very expensive, your QC calculation will be pretty long, and you can probably afford a larger system, because the point-charge potential will not represent a significant fraction of the Orca run-time.

You can try different sphere sizes and run single point gradient calculations, analyze the Orca output and decide a reasonable choice:

system.Energy(doGradients=True)

pDynamo includes some DFT functionals and basis sets. You may choose Orca for two reasons. For higher speed – in parallel execution, though pDynamo code is getting faster – or because your method of choice is not available in pDynamo. Anyway, DFT pDynamo can work with PBC and force-switching for non-bonded interactions, so that you can run some test calculations for pDynamo before and after pruning the system and see whether you get similar results.

Fixing external atoms

Because your system is no longer complete and has a boundary with the vacuum, you should fix the positions of an external shell so that water does not leak out and the protein remains stable. A value around 3Å is reasonable.

#Fix atoms

atomIndex = system.sequence.AtomIndex ( "A:PI.801:P" )

core = AtomSelection ( selection = set ( [ atomIndex ] ), system = system )

mobile = core.Within ( 20.0 ).ByComponent ( )

fix = ~ mobile

system.DefineFixedAtoms ( Selection(fix.selection) )

Define the QC region

The next step in your setup is to define which atoms will belong to the QC region. A simple way to do that is generating a list of atom labels:

#Define the QC region

labels = [ "A:PI.801:P","A:PI.801:O1","A:PI.801:O2","A:PI.801:O3","A:PI.801:O4","A:PI.801:HO4"]

#labels.extend(["A:ASP.62:OD1", "A:ASP.62:OD2", "A:ASP.62:CG"])

labels.extend(["A:ASP.62:OD1", "A:ASP.62:OD2", "A:ASP.62:CG", "A:ASP.62:CB", "A:ASP.62:HB2", "A:ASP.62:HB3"])

and then converting that to a Python set of indices and then to a pDynamo selection:

print "Number of QC labels: ", len(labels)

atomlist=set()

for label in labels:

index = system.sequence.AtomIndex ( label )

atomlist.add(index)

qcRegion = Selection ( atomlist )

if -1 in atomlist:

print "Problem: Atoms not identified!!"

sys.exit()

print "Number of QC atoms defined: ", len(atomlist)

An important caveat you need to check is that all atom labels are properly recognized. If they are not, their index will be -1, which is what we are checking after making a selection.

A second issue you might run into is the following error:

Total active MM charge is not integral (or zero): 3.900.

What this means is that the total point charges of the MM atoms that comprise the QC selection does not add up to an integer. Because (ideally) a QC calculation needs an integer charge, this would produce a charge unbalance between the system before and after the QC/MM partition.

As an example, if you change the QC region in the example to the commented line, you will get this error. You might naively think that the negative charge of the Asp is located on both oxygen atoms so that including the carboxylate moiety would be enough. However, part of the negative charge – dependent on the force field – leaks out to the neighboring CH2 group. You have two choices here. Decide that you do not want to include the CH2, and then manually reallocate the charges so that the carboxylate has charge -1 and the CH2 has charge 0, or simply introduce the CH2 in the QC region. Of course, this second choice is more physically sound. Usually force fields tend to give integer charges to groups of atoms and residues (though a QC calculations would not result in such distinct frontiers), so that this problem is only encountered when you try to include only a small part of a residue and is easily solved by including a better chemical partition of the residue (that you can also discover by trial and error).

Setting up the QC method

To set up a QC method you need at least to choose a method and a basis set. This is not the place to review QC methods, but we just remind you that the choice of a correct method is crucial. Let us first stress the importance of the method, mainly for people coming from the molecular dynamics (MD) world. When doing MD, the main choice in the potential energy calculation is the force field to use. Those available for studying proteins, such as AMBER, CHARMM, OPLS have been extensively tested, and although they are different and some limitations are known, their behaviors are very similar, at least, much more similar than the differences one finds when comparing different QC methods. In the MD world accuracy is much more related to a sufficient sampling than to the force field. QC methods differ a lot both in their accuracy and in their computational cost, so you need to be careful in your choice of an adequate QC method and an adequate basis set.

Yes, but... which method should I choose?

We insist that QC methods are not a black box, but we can give you some recommendations.

If you want to calculate free energies, you need sampling. Therefore you need a cheap method and your most obvious choice are semi-empirical methods. As pDynamo already has many of them, there is no point in using Orca for that.
If you want more accurate energies, a good compromise is usually DFT. A reasonable basis set for these methods is of double-zeta quality (for example, def2-SVP). The choice of functionals is also important.
- Many functionals come in two versions: pure DFT (PBE, BLYP, ...) and a hybrid version that includes some Hartree-Fock exchange (PBE0, B3LYP, ...). If only because the latter include one more fitting parameter, they tend to give better results.
- The choice of a functional when studying enzyme catalysis is not simple. There are many review papers that show that some functionals are best for energy barriers, or hydrogen-bond description, or charged interactions, or dative bonds, and so. In an active site you will probably find all of these interactions playing relevant roles so you really need to make an educated guess and, if possible, calibrate it with higher level calculations.
- You can significantly increase the speed by using Resolution of the Identity (RI). For pure functionals the Orca keyword is RI. A pure functional with RI is probably the fastest choice. If you chose a hybrid functional, then you have two choices: RIJCOSX or RIJONX. Their scaling with system size and basis sets is different, so you should read the manual and test which one runs faster in your particular setting. Considerations about the quality of the gradients will also apply (see below).
For a negligible extra computational cost you can include a van der Waals correction to the functional. This makes sense in a packed protein environment, and can affect your energetic results.
Remember that the inclusion of some residues in the QC region can have a higher impact than an improvement in the basis set.
You can always do expensive single point calculations on optimized geometries. You have two choices here: improve the QC method and/or enlarge the QC region.
- Our experience is that enlarging the QC region usually gives bumpy non-smooth profiles. A single-point calculation rests on the hypothesis that the optimized geometry is good enough and similar enough to the one that would be obtained with the single-point method. And that may be too much of an assumption for an MM geometry and an MP2 energy, for example.
- When you improve the QC method, Orca is at your help. Orca has linear-scaling versions of powerful methods, such as coupled cluster. MP2 energy are very fast to calculate but not necessarily better than DFT. You also need a larger basis set for these post-Hartree-Fock methods, at least of triple-zeta quality. Our experience with previous versions of Orca is that some high level methods seemed not to properly include point-charge corrections, so test your results. Note that your QC energy will be different with and without point charges (you can test that by manually modifying the input file) because the Hartee-Fock orbitals will be different with and without point charges. But that does not mean the point charges are correctly included in the post-HF calculation.

Triple quoted strings for the options

pDynamo will pass verbatim the contents of the option keyword as the first lines of the Orca input file; from the beginning up to the charge and multiplicity line. Because that file can have several setup lines we find using triple quoted strings a clear way to setup your Orca calculations. Here is an example:

options= \

"""

! PBE0 D3BJ SVP SVP/J TightSCF RIJCOSX GRID4 GRIDX5 NOFINALGRID

%Basis

NewGTO O

"SV(P)+"

end

%pal nprocs 9

end

"""

Define the QC Model

To finalize the setup of the Orca QC model you need to specify the scratch directory, the job name and the command to call Orca. QC calculations can use a lot of disk space for temporary files, so you have to make sure your disk is large enough (probably not an issue for DFT calculations) and its access is fast. Therefore it is better that the disk is local and not on a network file-system.

You can setup the Orca file name to a derivative of your Python script file name with:

jobname = sys.argv[0][:-3]+'-orca'

qcmodel = QCModelORCA ( options, job=jobname, scratch='/s/P_scan_d1-d2', \

command='/usr/local/orca_3_0_0_linux_x86-64/orca', \

deleteJobFiles = False )

system.DefineQCModel ( qcmodel, qcSelection=qmRegion )

Define the non-bonded interactions model

The definition of the non-bonded interaction is very simple:

system.DefineNBModel ( NBModelORCA ( ) )

but it is essential to do it after the QC model has been defined! Otherwise the NB model is ignored and the QC calculation does not see the MM charges. You would see that because your output would look like:

--------------------------------- Summary of Energy Terms --------------------------------

Potential Energy = -2268625.2575 RMS Gradient = None

Fourier Dihedral = 3140.0767 Fourier Out-Of-Plane = 384.9307

Urey-Bradley = 161.4472 Harmonic Angle = 6875.7763

Harmonic Bond = 5675.3599 ORCA QC = -2284862.8483

------------------------------------------------------------------------------------------

instead of:

--------------------------------- Summary of Energy Terms --------------------------------

Potential Energy = -2324912.8448 RMS Gradient = None

Fourier Dihedral = 3140.0767 Fourier Out-Of-Plane = 384.9307

Urey-Bradley = 161.4472 Harmonic Angle = 6875.7763

Harmonic Bond = 5675.3599 MM/MM Elect. = -76292.3377

MM/MM 1-4 Elect. = 25451.4967 MM/MM LJ = -6053.9325

MM/MM 1-4 LJ = 4374.1029 QC/MM LJ = 53.5858

QC/MM 1-4 LJ = 18.2161 ORCA QC = -2288701.5670

------------------------------------------------------------------------------------------

And this lines would be missing from your output:

----------------------------- ORCA NB Model Summary ----------------------------

El. 1-4 Scaling = 0.500000 QC/MM Coupling = RC Coupling

--------------------------------------------------------------------------------

ORCA NB Model State set up.

Can you guess why even the Orca QC energy is different?

Apart from that, you will not get an error message, so be careful!

Setting up the total charge and multiplicity

Even though pDynamo checks that the total MM charge of the atoms that are in the QC region is an integer, it does not set this charge as the total charge of the QC region. By default, pDynamo always sets the charge of the QC regions to 0. And if this is compatible with the multiplicity of the system this can lead to important errors which are very difficult to spot!

If the charge of your system can be obtained from the MM force field, we recommend to automate the setting of the charge with:

totalCharge = sum([system.atoms[i].formalCharge for i in atomlist])

system.electronicState = ElectronicState ( charge = totalCharge, multiplicity = 1 )

If you manually define the charge, you might later decide to include a couple of Lys residues close to your substrate in the QC region and forget to increase the charge by +2, with fatal consequences ...

Even if you want to study a radical or a system with different number of electrons than the “normal” number, you can use this automated method with something like:

system.electronicState = ElectronicState (charge = totalCharge+1, multiplicity = 2)

Use deleteJobFiles = False

This option tells pDynamo whether to delete the input and output files generated with Orca at the end of the calculations. When setting up the system it is a good choice to keep these files, as you might want to check them to see how many cycles the SCF took, or the total running time, or simply if you want to test different setup options and you do not want to run it through pDynamo (unless you want to change the QC region, of course). You can also analyze the results with Molden or another visualization software. It's also good to check that the charge and the multiplicity of the QC region are the ones you expect.

The most common use of keeping the output file, however, is to trace back an error, as the messages reported by pDynamo are usually less informative than looking at the Orca output file. From version 1.8, however, pDynamo will keep the Orca input and output files in case it detects an error, and will rename them by adding “error_” at the beginning.

A third use of deleteJobfile=False is to improve the SCF convergence of a calculation. By default, Orca will look for a gbw file and use the orbitals in it if they are compatible with the current geometry and basis set. So if you have convergence problems you can run a calculation with a smaller basis set first and then by keeping the output files, run a second calculation that takes the orbitals from the converged calculation. Or if you have to run several single point calculations or a chain-of-states calculation, you can read the orbitals, and both accelerate convergence and assure that you are converging to the same electronic state.

The keyword for this option has changed in recent versions of pDynamo. It was previously called QCLEANUP. Because pDynamo will not raise an exception when passing an unknown keyword (that behavior may change in the future), you have to make sure which keyword applies.

Advanced options: playing with the command keyword

Because pDynamo calls whatever command you tell it to run Orca, you can customize a lot the pre-processing or post-processing of the output file. For example, if you need to change the basis set for a particular atom, you need to change that in the coordinates section. This cannot be done through the options keyword. But you could call a script that parses the current Orca input, adds the basis set functions and calls Orca. Then, pDynamo would read the output file directly.

pDynamo will read the energy and the gradient from a engrad file that Orca generates when a gradient calculation is passed. Although you might consider sophisticated approaches, such as calling Gaussian instead of Orca and then generating an engrad file from the Gaussian output one, remember that pDynamo also reads the Orca output file to look for Mulliken and Löwdin charges, therefore your Orca output needs to be there and have a "decent" format.

cclib

cclib is a module for parsing output files from different QC codes, including Orca. It has nothing to do with pDynamo, but one of the advantages of having pDynamo in Python is that you can combine it with other powerful tools.

cclib will parse an Orca output file and it will return lots of numerical data present in the output file as NumPy arrays. Yes, this means that if you use cclib you need also to delve into NumPy, but if you are using Python in a scientific environment you should have done that already, it will really pay back!

cclib can read the Orca single point energy and the orbitals. But it can also return more sophisticated output, such as the overlap matrix, as long as you tell Orca what to print, with these options:

options ="""

! BHLYP SVP SVP/J RIJCOSX PAL2

%output

Print [P_Overlap] 1

Print [P_MOs] 1

end

"""

For example, these lines are part of a code used to calculate the charge of a set of atoms in a QC calculation based on their atomic contribution to a set of molecular orbitals.

def couplings(donor_range, acceptor_range):

# Define the atomic orbtial ranges for the donor and the acceptor atoms

donorao = slice(data.atombasis[donor_range[0]][0], data.atombasis[donor_range[-1]][-1])

acceptorao = slice(data.atombasis[acceptor_range[0]][0], data.atombasis[acceptor_range[-1]][-1])

results =[]

for orb1 in range(data.homos[0]-4, data.homos[0]+1):

for orb2 in range(data.homos[0]-4, data.homos[0]+1):

orbital1= data.mocoeffs[0][orb1]

orbital2= data.mocoeffs[0][orb2]

enes = data.moenergies[0]

csi1d=np.dot(orbital1, data.aooverlaps[:,donorao])

csi1a=np.dot(orbital1, data.aooverlaps[:,acceptorao])

csi2d=np.dot(orbital2, data.aooverlaps[:,donorao])

csi2a=np.dot(orbital2, data.aooverlaps[:,acceptorao])

q1d = np.dot(orbital1[donorao], csi1d)

q1a = np.dot(orbital1[acceptorao], csi1a)

q2d = np.dot(orbital2[donorao], csi2d)

q2a = np.dot(orbital2[acceptorao], csi2a)

The infamous “Line search abandoned”

Many, if not all, users of Orca and pDynamo will have faced this error during an optimization. This error is usually caused by the gradients that Orca produces not being accurate enough (for the Conjugate Gradient algorithm). There are several solutions to this problem.

The first and simplest solution is to use pDynamo version 1.8 or higher, because it has a new Conjugate Gradient method which completely solves this problem. pDynamo version 1.8.0 also introduced several new robust minimizers, such as FIRE and L-BFGS. Our experience is that L-BFGS is both very efficient and very robust, and the convergence is pretty much unaffected by the use of Orca's DFT grid choices.

If for some reason you need to use a previous version, here is some advice.

First, use the latest Orca version. In recent versions, a gradient calculation automatically turns on the TightSCF keyword that is better for getting more accurate gradients (as the gradients are more sensitive to the converged density than the energy). You can also try increasing the DFT grid to Grid4 (or higher). In such cases you don't need a final grid, so also use NoFinalGrid. These settings are more expensive than the default ones, so if you are doing a dynamics you might prefer not to use them. I do not have experience with that, but according to the Orca manual different methods to treat the resolution of the identity (RIJONX, RIJCOSX) give gradients of different quality, so you might also try these. But as these methods have different cost, you have to check what is better, whether to play with the grid sizes, or to change the method. In fact, apart from the DFT grid you can also change the grid for the exchange calculation (GridX keyword) though I have never needed to change it.

If your geometry is far from a minimum (for example, during the first minimization you do on a system), your second choice might be doing a Langevin Dynamics at a low temperature and a high coupling constant. The high coupling constant will ensure that the thermal energy generated during the dynamics is absorbed by the bath and your system will indeed be simulated at a very low temperature. Of course, such a low temperature will lead you to the closest minimum, you shouldn't expect a good sampling with a short dynamics at low temperature.

Examples

In the attached files you will find two examples with a full setup of a QC/MM calculation. For both you will also need to unpack the file system_files.tgz.

Example1.py: a geometry optimization.
Example2.py: a chain of states run.

Page updated

Google Sites

Report abuse