force fieldS

Contributed by Guido Raos for pDynamo version 1.7.2.

Introduction

The pDynamo program comes with a series of molecular mechanics parameter files, which are based on the OPLS-AA, CHARMM and AMBER force fields for biomolecular simulations. Using pDynamo's methods for handling topological information and atomic coordinates, these can be readily used to set up a system and start a simulation.

This approach works smoothly as long as the system is entirely made up of "standard" functional groups (e.g., the twenty natural amino acids and water). This short tutorial provides a simple example of the steps which have to be taken in order to set up a force field simulation of a system containing a non-standard functional group.

A simple example: paracetamol

Suppose we want to simulate N-(4-hydroxyphenyl)acetamide (i.e., paracetamol):

We want to do this with an OPLS-AA-based force field. Many of the atom types within paracetamol will be found also in a protein, so the parameter files in the parameters/forceFields/opls/protein directory should provide a good starting point.

The first step is to prepare a mol file with the necessary structural and topological information:

Paracetamol.mol file:

Paracetamol, MOL file with coordinates from HXACAN19 (CSD)

20 20 0 0 0 0 0 0 0 0 0

-0.3177 3.3226 -0.9678 C 0 0 0 0 0 0

0.9227 3.3430 -1.5728 C 0 0 0 0 0 0

1.9295 2.4871 -1.1046 C 0 0 0 0 0 0

1.6801 1.6333 -0.0511 C 0 0 0 0 0 0

0.4513 1.6303 0.5679 C 0 0 0 0 0 0

-0.5596 2.4809 0.1016 C 0 0 0 0 0 0

-1.5977 4.7312 -2.5566 C 0 0 0 0 0 0

-2.8980 5.5248 -2.6691 C 0 0 0 0 0 0

-1.4028 4.1546 -1.3646 N 0 0 0 0 0 0

2.6830 0.8058 0.3641 O 0 0 0 0 0 0

-0.8034 4.6588 -3.4836 O 0 0 0 0 0 0

1.1397 3.9954 -2.3739 H 0 0 0 0 0 0

2.9193 2.5308 -1.5326 H 0 0 0 0 0 0

0.2400 0.9913 1.4080 H 0 0 0 0 0 0

-1.5173 2.4718 0.5468 H 0 0 0 0 0 0

2.5343 0.4395 1.2432 H 0 0 0 0 0 0

-2.1223 4.2315 -0.7066 H 0 0 0 0 0 0

-3.2244 5.5639 -3.6318 H 0 0 0 0 0 0

-3.5309 5.2491 -2.0532 H 0 0 0 0 0 0

-2.7358 6.4690 -2.4551 H 0 0 0 0 0 0

1 2 4

1 6 4

1 9 1

2 3 4

2 12 1

3 4 4

3 13 1

4 5 4

4 10 1

5 6 4

5 14 1

6 15 1

7 8 1

7 9 1

7 11 2

8 18 1

8 19 1

8 20 1

9 17 1

10 16 1

M END

The comment at the top says that the coordinates come from the HXACAN19 crystal structure, on the Cambridge Structural Database. The original HXACAN19.cif file was converted to mol format, and the connectivity information in second part of the file was edited in order to specify the correct bond orders between each pair of atoms (1=single bond, 2=double bond, 3=triple bond, 4=aromatic bond). This information will be used below, as it determines the "connectivity pattern", which in turn determines the atom types.

We might then try to do the following simple calculation:

"""A simple gas-phase MM minimization of Paracetamol"""

import os.path

from pCore import YAMLPickle, Clone, logFile

from pBabel import MOLFile_ToSystem

from pMolecule import MMModelOPLS, NBModelFull

from pMoleculeScripts import ConjugateGradientMinimize_SystemGeometry

# . Define the MM and NB models.

rootDirectory = os.getenv ( "PDYNAMO_ROOT" )

mmModel = MMModelOPLS( os.path.join( rootDirectory, "parameters/forceFields/opls/protein" ))

nbModel = NBModelFull( )

# . Generate the System.

molecule = MOLFile_ToSystem( "Paracetamol.mol" )

molecule.DefineMMModel( mmModel )

molecule.DefineNBModel( nbModel )

# . Keep these for the RMSD analysis

coord0 = Clone(molecule.coordinates3)

masses = molecule.atoms.GetItemAttributes( "mass" )

# . Minimize the energy of a single molecule in the gas phase

eStart = molecule.Energy()

ConjugateGradientMinimize_SystemGeometry( molecule )

eStop = molecule.Energy()

# . Save the System for later use.

YAMLPickle( "ParacetamolMM-gas.yaml" , molecule )

# . Determine the RMS coordinate deviation between the optimized and unoptimized structures.

molecule.coordinates3.Superimpose( coord0, weights = masses )

rms = coord0.RMSDeviation( molecule.coordinates3, weights = masses )

# . Print the results.

table = logFile.GetTable( columns = [ 30, 30 ] )

table.Start( )

table.Title( "Minimization Results (MM)" )

table.Entry( "Energy Change", alignment = "l" )

table.Entry( "%20.4f" % ( eStop - eStart, ) )

table.Entry( "RMS Coordinate Deviation", alignment = "l" )

table.Entry( "%20.4f" % ( rms, ) )

table.Stop( )

However, when we do this we get the following error message:

Output of python script (1):

----------------------------------------------------------------------------------------------------

Untyped Atoms

----------------------------------------------------------------------------------------------------

C0 C6 N8 O10 H16

----------------------------------------------------------------------------------------------------

Traceback (most recent call last):

....

The program does not "recognize" the pattern formed by the following atoms:

The reason is that the patterns.yaml file in the parameters/forceFields/opls/protein directory contains the definition of a protein-like primary amide, in which the nitrogen in bonded to an aliphatic carbon:

Extract from the original patterns.yaml:

---

# . !MMPatternContainer

Atom Fields:

- Key

- Atomic Number

- Connections

- Formal Charge

- Hydrogens

- Is Aromatic

- Type

- Charge

Bond Fields:

- Atom Key 1

- Atom Key 2

- Type

Label: Protein

Patterns:

# ...

# ...

- Label: Primary Amide

Atom Patterns:

- [ 0, 7, 3, 0, 2, False, "N.PRIMARYAMIDE", 0.0 ]

- [ 1, 6, 3, 0, ., False, C , 0.5 ]

- [ 2, 8, 1, 0, 0, False, O , -0.5 ]

Bond Patterns:

- [ 0, 1, Single ]

- [ 1, 2, Double ]

We need to define the pattern for a primary amide in which the nitrogen is bonded to an aromatic carbon. This can be done by inserting the following lines within patterns.yaml, between those of the primary and secondary aliphatic amides (warning: in general, the order of the definitions can be important!):

Addition to patterns.yaml:

- Label: N-phenylacetamide

Atom Patterns:

- [ 0, 7, 3, 0, 1, False, N , -0.085 ]

- [ 1, 6, 3, 0, ., False, C , 0.500 ]

- [ 2, 8, 1, 0, 0, False, O , -0.500 ]

- [ 3, 6, 3, 0, 0, True, CA , 0.085 ]

Bond Patterns:

- [ 0, 1, Single ]

- [ 1, 2, Double ]

- [ 0, 3, Single ]

Note that the atom types entering the pattern (e.g., "C" for aliphatic carbon, "CA" for aromatic carbon, etc.) are defined in atomTypes.yaml. The hydrogens are implicit, their charges being specified in atomTypes.yaml. The charges of the "heavy atoms" in patterns.yaml include those of the hydrogens attached to them and they override the default values in atomsTypes.yaml.

Hint: in more complicated cases, it might be difficult to understand what is wrong with a molecule's connectivity data. It is possible to print these data, by doing something like:

# ...

for i in indicesOfDesiredAtoms:

atom = system.atoms[i]

print i, atom.__dict__

# ...

With the above changes to atomTypes.yaml, pDynamo "recognizes" all the atoms in Paracetamol.mol. However, we still get an error message when we run the python script:

Output of python script (2):

--------------------------------------------------------------

Missing Force Field Parameters

--------------------------------------------------------------

Fourier Dihedral N CA CA CA

Fourier Dihedral N CA CA HA

Fourier Out-Of-Plane CA CA CA N

Fourier Out-Of-Plane N C CA H

--------------------------------------------------------------

Traceback (most recent call last):

....

These missing parameters must be specified in fourierDihedralParameters.yaml:

# ...

- [ N , CA , CA , CA , 3.6250, 2, 180.0 ]

- [ N , CA , CA , HA , 3.6250, 2, 180.0 ]

# ...

and in fourierOutOfPlaneParameters.yaml:

# ...

- [ CA , CA , CA , N , 1.1000 ]

- [ N , C , CA , H , 1.0000 ]

# ...

Their values can be inferred from Jorgensen's original parameter files in the parameters/forceFields/opls/rawData directory. If these parameters had not been available, we would have been forced to undertake the real parametrization work (starting with ab initio calculations of the atomic charges, bond torsion potentials, etc.). This can be quite lengthy and complex and will not be discussed here.

With all the above changes to the parameter files, our gas-phase minimization proceeds to convergence with fairly satisfactory results, as shown by the small RMS deviation between the starting and final atomic coordinates:

Extract from final output:

....

------------------------------------------------------------

Minimization Results (MM)

------------------------------------------------------------

Energy Change -87.6305

RMS Coordinate Deviation 0.0590

------------------------------------------------------------