MM and QM/MM setup

MM and QM/MM setup for a metalloprotein using GROMACS,  Chemshell and ORCA
(Using the CHARMM forcefield (both in GROMACS and with Chemshell) and the ORCA code as QM code for Chemshell.)

This page contains technical details regarding the GROMACS MM setup and QM/MM setup in Chemshell as performed in our QM/MM study on nitrogenase:
Bardi Benediktsson and Ragnar Bjornsson*, Inorg. Chem. 201756, 13417-13429.


Scripts to perform GROMACS MM setup and MM simulations for a metalloprotein and subsequent Chemshell QM/MM calculations are described on this page and available  here:


The CHARMM36 forcefield is used throughout. Topology and parameter files were downloaded from Alexander McKerell's website both in GROMACS and CHARMM formats and modified.
If you find these scripts useful, please consider citing the article above.



GROMACS setup

Our MM model setup with GROMACS is loosely based on the tutorial written by Justin Lemkul (recommended reading):

http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/01_pdb2gmx.html

Before setting up an MM model in GROMACS of a metalloprotein, a basic topology and parameters for the metal cofactors is required, usually not available in protein forcefields. In the case of nitrogenase, only nonbonded MM parameters were defined. It's a good idea to look at the literature for available topologies/parameters for exotic species.

Atom charges for FeMoco can be derived by a population analysis from a QM calculation. In our case, Natural population analysis charges from a BP86 DFT calculation of the cofactor (described by a polarizable continuum) were used. These charges are added to the merged.rtp inside the charmm36.ff GROMACS directory (see Github page). The values of the charges is not critical as they are not used at all in the QM/MM step (assuming electrostatic embedding QM/MM) if they will be in the QM-region. They are used in the MM optimization/MD steps, however and need to be sensible (otherwise severe artifacts from the MM prep will arise). The charges should add up to the correct integer charge of the group.

The FeMoco entry (ICS residue name) inside merged.rtp:

[ ICS ] ; The name of the residue, in this case ICS is FeMoco of Mo-nitrogenase.

[ atoms ] ; The first column is the atom name (should match PDB atom entry), the second column is the atom type (will determine LJ parameters), the third column is the charge (NOTE, total charges should add up ot the desired integer number) and the fourth column is the chargegroup).

 

MO1 MO    0.3861000 0

FE1 FE    0.9370699 1

FE2 FE    0.65790667 2

FE3 FE    0.65790667 3

FE4 FE    0.65790667 4

FE5 FE    0.62767667 5

FE6 FE    0.62767667 6

FE7 FE    0.62767667 7

S1 SM    -0.5904233 8

S2 SM    -0.5904233 9

S3 SM    -0.5904233 10

S4 SM    -0.4311467 11

S5 SM    -0.4311467 12

S6 SM    -0.4311467 13

S7 SM    -0.4733233 14

S8 SM    -0.4733233 15

S9 SM    -0.4733233 16

CX C     -1.69524   17


The atom type must exist in your parameter file (which is probably not the case if you are working with a protein with a transition metal center) and preferably have at least non-bonded parameters. For iron, you would need to add a line in the MASS section:

MASS    91 FE   55.84700 FE ! iron

Where ’MASS’ indicates that this is the mass section, ’91’ is the entry number in the list, ’FE’ is the designated atomtype, ’55.84700’ is the molar mass, and ’! iron’ is the comment line

Before starting the GROMACS step, it is essential to visualize the protein and have a good look at it. You should check for residues that have multiple conformations (delete as appropriate), check the protonation state of titrable amino acid residues and check whether it would be logical to flip e.g. HIS residues that have both nitrogen and carbon in its functional group 180°. Flipping of Histidine is easily performed by reordering the coordinates in the PDB file of atoms: ND1, CE1, NE2 and CD2. 

If you have a e.g. cysteine bound to a metal ion (common for iron-sulfur clusters) then that cysteine is very likely deprotonated and then you should select a deprotonated residue (modify the PDB file e.g. change CYS to CSD). The CSD residue definition is part of the CHARMM36 forcefield.

If the -his, -glu, -lys, -asp flags (the -his flag is the most important) has been selected then GROMACS will initially ask for the desired protonation site on His (HSE for NE2 protonation, HSD for ND1 protonation, HSP for double protonation). The desired protonation site should be selected. While pKA prediction codes can be used for this task, we prefer visual inspection to check for hydrogen bond patterns that usually make the protonation site easily determined, and also allows one to see if the His residues should be flipped.


Our MM protocol consists of six steps:

1. Manual modification of the PDB file: 

    a) Deletion of extra conformations in residues. Often labelled as A and B. Delete one of them and rename.

    b) Deletion of undesired species (ions, contamination etc.)

    c) Renaming of residues and atoms to match the forcefield. 

    d) Addition of hydrogens or other missing atoms for cofactors or other non-protein species. This is only needed for non-protein species as GROMACS can automatically add hydrogens of all protein residues.


2.      PDB to GROMACS conversion (this is the main headache)

Conversion of the unprotonated protein data bank (PDB) structure into a protonated structure. You can specify whether a titrable amino acid residue should be protonated or not during this step. The CHARMM36.ff forcefield is selected here (that may or may not be present in your version of CHARMM). To make the CHARMM36 forcefield (that will be modified) available to GROMACS you should put the downloaded charmm36.ff forcefield inside a directory and specify that directory by the environment variable: e.g.  export GMXLIB=/home/bjornsson/gromacs-forcefields

You may also want to copy over other Gromacs forcefield files to this same directory (that GMXLIB points to) such as residuetypes.dat that you may have to modify. The GROMACS forcefield files are in /path/to/gromacs-installation/share/gromacs/top.

If you use non-standard amino acid residues such as deprotonated cysteines (CSD residue in CHARMM36) then the residuetypes.dat file needs to contain the entry (otherwise gmx pdb2gmx command will complain):

CSD     Protein


The GROMACS command should be run e.g. like this:

gmx pdb2gmx -f ProteinDataBankFile.pdb -lys -glu -his -o Protein-processed.pdb -ff charmm36 -water tip3


It is helpful go through this step once manually, inspect visually each residue (at least the HIS residues) and compile the list of protonation states. A list of HIS protonation states can then be compiled:

0 0 1 1 0 0 0 0 2 0 0 1   (where 0 means a HSD protonation state (ND1 proton), 1 means a HSE protonation state (NE2 proton) and HSP means a doubly protonated state (ND1 and NE2 protons).

This list can then be fed into the gmx program next time:

echo 0 0 1 1 0 0 0 0 2 0 0 1 |  gmx pdb2gmx -f ProteinDataBankFile.pdb -lys -glu -his -o Protein-processed.pdb -ff charmm36 -water tip3

The hydrogenated protein file (Protein-processed.pdb) should then be visually inspected (e.g. VMD).

Note:  if you have non-standard terminii (other than N and O), e.g. acetyl capping groups (ACE) then you have to account for this. Adding the "-ter" to the gmx command above means that GROMACS will ask what it should do with the start terminus in terms of protonation state. For an acetyl group, choose "None".


3.     Next, there is the creation of a solvation box around the protein and defining the periodic boundary conditions. 

gmx editconf -f Protein-processed.pdb -o Protein-boxed.pdb -c -d 1.0 -bt cubic


4.      Then the box is filled with solvent. 

gmx solvate -cp Protein-boxed.pdb -cs spc216.gro -o Protein-solvated.pdb -p topol.top –n index.ndx


5.      As the protein is likely charged at this stage, excess charge is neutralized by the generation of counter-ions (e.g. Na+ and/or Cl- ) .

gmx grompp -f ions.mdp -c Protein-solvated.pdb -p topol.top –n index.ndx -o ions.tpr

Where the file ions.mdp file contains the run parameters.

 Secondly, the generated binary file, ions.tpr, is used to create the charge neutral system.

gmx genion -s ions.tpr -o Protein-ionated.pdb -p topol.top -pname NA -nname CL -np 40 


6.  Relaxation of the system. Metallocofactors are not relaxed, unless forcefield parameters exist and atoms directly in contact with the metallocofactor are constrained as well. Thus contraints are usually applied by modifying the MDP file e.g. like this:

freezegrps = Protein-H Cofactor  ; Defining which groups are restrained. Here protein-heavy-atoms are constrained and all atoms part of Group cofactor.

freezedim = Y Y Y Y Y Y   ; restrained in X Y Z dimensions, one Y for each dimension of each group to restrain

It's a good idea to first relax only the hydrogen atoms of the protein and water in the system, while keeping metallocofactor groups completely frozen (including their hydrogen atoms). Later, the protein heavy atoms can be relaxed as well (if desired).

To define specific groups then an index file has to first be created: 

gmx_d make_ndx -f xraymodel-solvionated.pdb -o index.ndx

The index file can then be either manually updated or using the tool:

gmx_d make_ndx -f xraymodel-solvionated.pdb -n index.ndx


The following command sets up the minimization:

gmx grompp -f minim.mdp -c Protein-ionated.pdb -p topol.top -o hmin.tpr -n index.ndx

Where the file minim.mdp contains the run parameters.

The minimization is then started (can often be run directly, without job submission):

gmx mdrun -ntomp 1 -ntmpi 1 -v -c Protein-relaxed.pdb -deffnm minim

If there are problems during minimization then this can often solved by visualizing the optimization trajectory. As these are periodic simulations it is usually best to create a new compress trajectory file where additionally solvent motion across boxes has been smoothed out:

gmx trjconv -s minim.tpr -f minim.trr -o minim_noPBC.xtc -pbc mol -center

One can then open the trajectory in VMD like this:

vmd hmin.gro minim_noPBC.xtc


7.      A molecular simulation of the protein in solution. MD is performed for a period of e.g. 5 ns using the canonical ensemble (NVT) and at a random timepoint, the geometries are extracted and are used for preparation of the QM/MM model.

gmx grompp -f nvt.mdp -c Protein-relaxed.pdb -n index -p topol.top -o nvt.tpr

Where the file nvt.mdp contains the run parameters.

To start the MD: 

gmx mdrun -v -deffnm nvt -c Protein-nvt.pdb

 The MD simulation should then be analyzed carefully (e.g. temperature, RMSD etc.) to check that everything is order. 

Analyze RMSD: gmx_d rms -s nvt.tpr -f nvt_noPBC.xtc -o rmsd.xvg -tu ns

Analyze Temperature: gmx_d energy -f nvt.edr -o temperature.xvg

Write compressed trajectory file (accounting for PBC effects): gmx trjconv -s nvt.tpr -f nvt.trr -o nvt_noPBC.xtc -pbc mol -center

Visualize trajectory in VMD: vmd hmin.gro nvt_noPBC.xtc

A classical MD simulation is necessary prior to any QM/MM study. This allows one to check that all force field parameters are in order, that there are no close contacts between atoms etc. It is up to the user to decide, however, what sorts of constraints should be applied to the system during the MD step. If the protein contains multiple exotic groups such as metal cofactors (that usually need to be constrained) then it may be the case that a classical MD simulation is going to perturb the system too much and introduce artifacts prior to the QM/MM. In those cases, constraining the entire protein and cofactors at the X-ray crystal structure position may be a better choice while allowing only the solvent (and ions) to move.

If the MD simulation looks OK a snapshot (here after 5000 ps) can be extracted:

gmx_d trjconv -s nvt.tpr -f nvt.trr -dump 5000 -o snap-5000ps-diff.pdb



Setting up QM/MM model in Chemshell using the CHARMM forcefield

Once the MM(CHARMM) model is set up (using GROMACS protocol like above or using another program), pre-optimized and/or equilibrated, a specific structure can be chosen for QM/MM modelling in Chemshell. 
This requires: converting the final snapshot PDB file to a Chemshell fragment file , creation of a protein-structure file (PSF) file (when using CHARMM force field) and creation of various lists. This is handled by scripts described below.

1. Optional: cut sphericalmodel out of periodic box
It depends on the system whether this should be done or not. If the system is overly large then cutting a 30-40K atom spherical model centered on the active site of interest is a good idea and can be accomplished using VMD. Chemshell will also behave better for systems with number of atoms < 40 K.
If you make a spherical model, make sure to do intelligent cuts by residue. You also may want to include enough counterions for the final model to be neutral.

2. Cut PDB file into separate segments. 
This is necessary for the PSFgen script below.
Currently this step is performed via a rather slow shellscript but it works quite well. The script requires cofactor residues to be identified.

./Chemshell-protein-setup.sh

Modify the first lines of the script for your system.

#Note the number for the last protein segment and the last sol segment.

3. Create PSF file using PSFgen file

First PSFgen needs to be installed: http://www.ks.uiuc.edu/Research/vmd/plugins/psfgen/
The psfgen-template.tcl file then should be downloaded.

Also you need a CHARMM-style topology file that needs to contain the same information on residues and atoms as the modified files of GROMACS did.
Here is the modified nitrogenase topology file: top_all36_prot.rtf as an example.
An example FeMoco topology looks like this in the top_all36_prot.rtf  :

!RB adding ICS
! FeMoco -1 charge
! Fe1 is edge Fe. Fe2-Fe4 are Fe cubane triangle, Fe5-Fe7 are Mo triangle. S1-S3 are Fe cubane sulfur, S4-S6 are Mo cubane sulfur, S7-S9 are bridging-cubane sulfurs.
RESI ICS         -1.00 !
GROUP
ATOM   MO1 MO    0.38
ATOM   FE1 FE    0.94
ATOM   FE2 FE    0.66
ATOM   FE3 FE    0.66
ATOM   FE4 FE    0.66
ATOM   FE5 FE    0.63
ATOM   FE6 FE    0.63
ATOM   FE7 FE    0.63
ATOM   S1 SM    -0.59
ATOM   S2 SM    -0.59
ATOM   S3 SM    -0.59
ATOM   S4 SM    -0.43
ATOM   S5 SM    -0.43
ATOM   S6 SM    -0.43
ATOM   S7 SM    -0.47
ATOM   S8 SM    -0.47
ATOM   S9 SM    -0.47
ATOM   CM C     -1.72

The Fe, MO, SM atom types need to be defined in the parameter file in the nonbonded section (Lennard-Jones). Here Lennard-Jones parameters need to be defined.
Their masses also need definition at the top of the topology and the parameter file.
If appropriate values for these atom types are not known (and no good model system can be found) then the parameters can be set to 0 (but usually better to set to similar values as related chemical groups). This is fine if these atoms will not be engaging in short-range QM and MM interactions, this may be the case if the QM region is set large enough.

Run the program:
psfgen psfgen.tcl >& psfgen.out
Carefully check psfgen.out output for errors

You may get warnings about badly guessed coordinates for the peptide-chain terminii. The peptide-chain terminii should be manually checked in the final PDB file.
Final PDB file: new.pdb
PSF-files: newxplor.psf and new.psf. The "newxplor.psf" file is recommended to use.

The PSF files will be used by Chemshell.

4. Create necessary system lists for Chemshell (Python script)
./chemshell-listprep.py

Running this script will create a file save-new.chm that will contain the following lists:
groups (a list of lists all the CHARMM residue charge groups)
charges ( a list of all the atom charges)
pdbresidues (a list of lists of all the residues and the atom numbers of each residue)
residuegroups (a list of lists of the atoms in each residue)
types (a list of all the atom types of the system)

The save-new.chm file is later sourced by Chemshell during QM/MM calculation.

Setting up QM/MM model in Chemshell using the Amber forcefield (set up via Ambertools)

If an MM model has been set up using the Amber software, pre-optimized and/or equilibrated, a specific structure can be chosen for QM/MM modelling in Chemshell. 
This requires: converting the Amber coordinate file to a Chemshell fragment file and the creation of various lists. This is handled by scripts described below.

1. Optional: cut sphericalmodel out of periodic box
It depends on the system whether this should be done or not. If the system is overly large then cutting a 30-40K atom spherical model centered on the active site of interest is a good idea and can be accomplished using VMD. Chemshell will also behave better for systems with number of atoms < 40 K.
If you make a spherical model, make sure to do intelligent cuts by residue. You also may want to include enough counterions for the final model to be neutral.

Get the Amber coordinate file (file.inpcrd and the Amber topology file file.prmtop) for the system. 
Currently also requires an XPLOR-style PSF file to be created to for defining lists below. Will be changed.

2. Set up various lists for the system. 
Download the file amber-chemshell-prep.chm, the listprep-foramber.py script and the modified parse_amber.tcl file . Run the amber-chemshell-prep script:
chemshell.sh amber-chemshell-prep.chm

This Chemshell script needs to be pointed to the Amber inpcrd file, the prmtop file and a PSF-file by setting variables at the top of script.
The script outputs system.c which is a Chemshell fragment file containing the coordinates and the correct connectivity.
It also outputs the save-new.chm file containing various residues lists (made by Python script). The listprep-foramber.py script needs to be present and a modified parse_amber.tcl file (source by script).

Reason for lists in save-new.chm:
residuegroups: Needed for active region selection in actregiondefine.chm script (see below).
pdbresidues : Needed for HDLC residues used in HDLC QM/MM optimization.
types: List of atom types. Needed for constraints (and for CHARMM interface).

The save-new.chm file just  created is later sourced by Chemshell during QM/MM calculation.

Create QM/MM inputfile (for use with ORCA and the Amber forcefield)
The QM-MM-opt-amber.chm inputfile is then created and looks like this.

#Name of fragment file
set frag system.c

#Chemshell-scripts location. 
#This dir should contain the update ORCA interface (orca-chemsh-withimage-withbs.tcl), the procs.tcl file (various useful functions), the topology and parameter files etc.
set scriptdir /home/bjornsson/QM-MM-Chemshell-scripts/

#Putting fragment file in memory.
fragment $frag old persistent
set numatoms [get_number_of_atoms coords=$frag]
puts "Number of atoms is $numatoms"

#Sourcing various TCl procs
source $scriptdir/procs.tcl

#ORCA path and sourcing of updated ORCA interface.
set orcapath /opt/orca_4.2.0
source $scriptdir/orca-chemsh-withimage-withbs.tcl


#Sourcing file containing pdbresidues and residuegroups
source save-new.chm

#Defining Amber topology file
set ambertop 1mxr_solv.prmtop

# sourcing active and frozen lists. Act list should have been previously created by a script, for example actregiondefine.chm.
source act
puts "Active region is [llength $act] atoms."

#Frozen list defined based on act list
set all [seq 1 to $numatoms]
set frozen [listcomp $all $act]

puts "Frozen region is [llength $frozen] atoms."

# QM REGION atoms, charge and multiplicity. qmatoms file should have been previously created (manually or by script).
source qmatoms

#Checking qmatoms list for duplicates.
checkQMregion $qmatoms
checkQMregion $act
puts "There are [llength $qmatoms] QM atoms and they are $qmatoms"

#Setting charge and multiplicity of the QM region.
set charge 1
set mult 2

###################
#Special BS settings
####################
set brokensym yes
#Multiplicity of High-spin state and Broken-symmetry state. Will override $mult. Comment out if not using broken-symmetry.
set hsmult 10
set bsmult $mult

#Selecting which system atom numbers to flip.
#Will be converted to ORCA inputfile atom numbers by atomnumtoQMregionnum and then converted to comma-sep string.
#247
set atomstoflip {5510}
set spinstofliplist [atomnumtoQMregionnum $qmatoms $atomstoflip]
set spinstoflip [join $spinstofliplist ","]
puts "spinstoflip is $spinstoflip and spinstofliplist is $spinstofliplist"

##################
# ORCA Theory level in simple input line
set orcasimpleinput "! b97-3c"

# ORCA block settings
set orcablocks "
%maxcore 2000

%basis
end

%pal
nprocs 1
end
"
###################################################################

#Setting up X-H and H-H constraints (TIP3) for optimization. Set jobtype to md for MD constraints
set jobtype opt
# To correctly find the TIP3P water the oxygen atom type is defined below. e.g. OT or OW
set waterOtype "OW"
source $scriptdir/constraints-onlytip3.tcl

# Setting mxlist
set mxlist 38000
puts "mxlist is $mxlist"

# Optimisation


dl-find \
  list_option=full coords=$frag active_atoms= $act constraints= $con maxcycle=1000 \
  coordinates=hdlc residues= $pdbresidues maxstep=0.5 result=result.c \
        theory=hybrid : [ list \
           coupling=shift  debug=no qm_region= $qmatoms conn=$frag \
           qm_theory=orca: [ list \
              executable=$orcapath/orca \
              brokensym=$brokensym \
              hsmult=$hsmult \
              bsmult=$bsmult \
              spinstoflip=$spinstoflip \
              charge=$charge \
              mult=$mult \
              orcasimpleinput= $orcasimpleinput \
              orcablocks= $orcablocks ] \
           mm_theory=dl_poly : [ list \
              frozen= $frozen \
              conn= $frag \
              debug=no \
              use_pairlist=no \
              exact_srf=yes \
              mxlist= $mxlist \
              mxexcl=500 \
              cutoff=1000 \
              scale14 = { 1.0 1.0 } \
              amber_prmtop_file=$ambertop  ]]

write_xyz coords=result.c file=result.xyz

times


Create QM/MM inputfile (for use with ORCA and the CHARMM forcefield)
The QM-MM-opt-charmm.chm inputfile is then created and looks like this.


#Name of fragment file
set frag system.c

#Chemshell-scripts location. This dir should contain the update ORCA interface (orca-chemsh-withimage-withbs.tcl), the procs.tcl file (various useful functions), the topology and parameter files etc.
set scriptdir /home/username/Chemshell-scripts

#Putting fragment file in memory.
fragment $frag old persistent
set numatoms [get_number_of_atoms coords=$frag]
puts "Number of atoms is $numatoms"
# Read in file (should be in same directory) for lists. This sets up lists $charges, $groups, $types, $pdbresidues and $residuegroups
source save-new.chm
#PSF file from PSFgen
set psffile new-XPLOR-psffile.psf
#Loading connectivity from PSF file
load_connect_from_psf $frag $psffile
#Sourcing various TCl procs
source $scriptdir/procs.tcl

# Topology
#Here using modified CHARMM36 files. SHould
set topmass $scriptdir/top_all36_prot.rtf
set charmmpar $scriptdir/par_all36_prot.prm

#ORCA path and sourcing of updated ORCA interface.
set orcapath /home/user/orca
source $scriptdir/orca-chemsh-withimage-withbs.tcl

# sourcing active and frozen lists. Act list should have been previously created by a script, for example regiondefine.chm.
source act
puts "Active region is [llength $act] atoms."
#Frozen list defined based on act list
set all [seq 1 to $numatoms]
set frozen [listcomp $all $act]

puts "Frozen region is [llength $frozen] atoms."
# QM REGION atoms, charge and multiplicity. qmatoms file should have been previously created (manually or by script).
source qmatoms
#Checking qmatoms list for duplicates.
checkQMregion $qmatoms
puts "There are [llength $qmatoms] QM atoms and they are $qmatoms"

#Setting charge and multiplicity.
set charge -4
set mult 2


##################
# ORCA Theory level in simple input line
set orcasimpleinput "! TPSSh ZORA ZORA-def2-SVP SARC/J RIJCOSX D3BJ Grid5 FinalGrid6 tightscf slowconv"

# ORCA block settings
set orcablocks "
%maxcore 2000

%basis
NewGTO Fe \"ZORA-def2-TZVP\" end
NewGTO S \"ZORA-def2-TZVP\" end
NewGTO Mo \"old-ZORA-TZVP\" end
addGTO Mo
F 1
1 0.6554500000 1.0000000000
end
end

%pal
nprocs 12
end
"
###################################################################

#Setting up X-H and H-H constraints (TIP3) for optimization. Set jobtype to md for MD constraints
set jobtype opt
source $scriptdir/constraints-onlytip3.tcl

# Setting mxlist
set mxlist 38000
puts "mxlist is $mxlist"

# Optimisation
dl-find \
  list_option=full coords=$frag active_atoms= $act constraints= $con maxcycle=1000 \
  coordinates=hdlc residues= $pdbresidues maxstep=0.5 result=result.c \
        theory=hybrid : [ list \
           coupling=shift  debug=no atom_charges= $charges qm_region= $qmatoms conn=$frag \
           qm_theory=orca: [ list \
              executable=$orcapath/orca \
              charge=$charge \
              mult=$mult \
              orcasimpleinput= $orcasimpleinput \
              orcablocks= $orcablocks ] \
           mm_theory=dl_poly : [ list \
              frozen= $frozen \
              conn= $frag \
              debug=no \
              use_pairlist=no \
              exact_srf=yes \
              mxlist= $mxlist \
              mxexcl=500 \
              cutoff=1000 \
              scale14 = { 1.0 1.0 } \
              use_charmm_psf=yes \
              charmm_psf_file=$psffile \
              atom_types= $types \
              charmm_parameter_file=$charmmpar \
              charmm_mass_file= $topmass ]]

write_xyz coords=result.c file=result.xyz

times


ORCA interface

We use a heavily modified ORCA interface that has to be sourced for the above inputfile to work. The ORCA inputfile (created by Chemshell) is defined via the "orcasimpleinput" and "orcablocks" variables that are then passed on to the updated ORCA interface. This differs completely from the regular ORCA interface in Chemshell.

The new ORCA interface is available at Github page as "orca-chemsh-withimage-withbs.tcl"


Define QM regions and active regions

Define QM region

The QM region is best defined via a simple script that creates a list (qmatoms) that is stored in a file (qmatoms) that is sourced by the Chemshell inputfile.
This is a useful way of documenting the sub-lists of atoms that define the residues.


qmregiondefine.chm:
#Chemshell-script location
set scriptdir /home/user/Chemshell-scripts

source $scriptdir/procs.tcl
####################################################
# SETTING UP QM REGION
###################################################
# QM REGION atom groups defined (system numbers)
set femoco [seq 17776 to 17793]
set hca [seq 17809 to 17829]
set imz [seq 6913 to 6923]
set cys {4176 4177 4178 4179}
set HZIN {37067 37068 37069 37070 37071 37072 37073}
#HZIN and extra proton

#Groups defined
set qmatoms [concat $femoco $hca $imz $cys $HZIN]
#QM region checked for duplicates
checkQMregion $qmatoms
set qmatomsfile [ open qmatoms w]
puts $qmatomsfile "set qmatoms {$qmatoms}"
close $qmatomsfile
puts "There are [llength $qmatoms] QM atoms. Written to file qmatoms."
puts "QM atoms:  $qmatoms"

Once the QM-region is defined it can then be checked by visual inspection by using the script qmedit.sh. The script will create a file called qmregioncoords.xyz that contains the QM region coordinates (in Bohr units). The coordinates can be copy-pasted into e.g. Chemcraft to check whether the QM region looks like expected (note that link atoms will be added by Chemshell later to terminate dangling bonds).

To change the coordinates of the QM region the geometry can be modified in Chemcraft and the updated coordinates in Bohr units can be copy-pasted back into the file qmrecoords.xyz. To update the coordinates in the fragment file (e.g. called system.c), the qmregionupdate.py script is run (requires Python):

qmregionupdate.py system.c   (requires qmregioncoords.xyz file to be present)

Once the QM-region is defined once like above it is usually more convenient to simply copy the qmatoms file between different directories for different calculations of the same system where the QM regions stays the same and the system does not change with respect to number of atoms.



Define active region

The active region is best defined once for the system using a script and the list is then best stored in a file (act) that is copied to every Chemshell-job file using the QM/MM setup. The active region should ideally not be updated as the QM/MM energy function depends on what atoms are active and which are frozen. It is best to define a good active region once (~1000 atoms) and then keep using the same active region throughout the project. However, if atoms are added/deleted to the the system then the act list needs to be updated carefully.
The following script shows how to conveniently create an approximately spherical active region around the active site

actregiondefine.chm:

#############
# ONLY USE IF STARTING NEW SYSTEM AND MISSING ACT FILE
##############

if { $argv == "" } {
puts "\033\[31mNo fragment file given.\033\[0m Do chemshell.bash actregiondefine.chm fragmentfile.c"
exit
}
#Chemshell-script location
set scriptdir /home/ragnarbj/QM-MM/nitrogenase/Chemshell-QM-MM-scripts

#Putting fragment file in memory.
set frag $argv
fragment $frag old persistent
#Colored output
#puts "Why not \033\[34mGooge\033\[0m first ?"
#puts "Why not \033\[34mG\033\[31mo\033\[33mo\033\[34mg\033\[32ml\033\[31me\033\[0m first ?"

source $scriptdir/procs.tcl
##################################################################
# SETTING UP ACTIVE ATOMS, FROZEN ATOMS AND CONSTRAINTS
##################################################################
source save-new.chm
# Set origin atom (atom in the middle of active region). Here carbon of FeMoco
set origin 17793
#Radius of active region (will select whole residues).
set radius 11
#Freezing special residues/atoms by atom numbers.
#Use seq function select range: Example: set frozgr1 [seq 17730 to 17829]
# or define list explictly: set frozgr1 { 17730 17731 17732 etc} .
# Combine by concat to list $extrafrozen (will be added to $frozen by select.tcl)
# IMIA, FE2P:
set frozgr1 [seq 17730 to 17775 ]
# P-cluster:
set frozgr2 [seq 17794 to 17808]
# These are counterions:
set frozgr3 [seq 37042 to 37066 ]
# These are cysteine sulfur atoms in P-cluster:
set frozgr4 {951 2350 9021 9901 8663 1323}

set extrafrozen [concat $frozgr1 $frozgr2 $frozgr3 $frozgr4 ]
# Calling Tcl code that sets up lists $act and $frozen using $origin, $radius and the frozen lists above.
# act and frozen lists are also written to disk
puts "There are [get_number_of_atoms coords=$frag] atoms."
source $scriptdir/select.tcl

delete_object $frag


Analyzing the result of QM/MM optimizations

Once the DL-FIND geometry optimization is done, the final system geometry is usually stored in a Chemshell fragment file called result.c (coordinates in Bohr). This file can also be written as an XMOL XYZ file result.xyz in Angstrom units (currently done using the write_xyz function at the bottom of the inputfile above). Often it may be mainly of interest to inspect the QM region coordinates. They can be found either by creating a new qmregioncoords.xyz file using:
qmedit.sh result.c   and then visualize the QM region coordinates in the newly created qmregioncoords.xyz, e.g. by copy-pasting into Chemcraft.
Alternatively, orca1.inp and orca1.out files are possibly copied back from your jobscript and they will contain the QM region coordinates from the last optimization cycle from Chemshell.

Trajectory files are also useful for visualizing how the geometry optimization proceeded. The path_active.xyz is an XYZ trajectory file that only contains the coordinates of the active region (usually ~1000 atoms). The path.xyz is an XYZ trajectory file for the entire system (much larger size of this file and it may take a while to load). These files are convenient to open in the VMD visualization program but an unfortunate aspect is that XYZ files only contain element information, no residue information at all (convenient for selecting the interesting parts of the system).

To create a PDB file of the final structure one can use the proper-PDBwrite.chm Chemshell script (see Github page):
chemshell-intel.sh proper-PDBwrite.chm result.c new-XPLOR-psffile.psf

This will create a result.pdb file that can be visualized in VMD with residue information available (active site can e.g. be more easily visualized in the Graphical Representations window: resname cofactorresiduename ).

It's also possible to use this PDB file when visualizing the trajectory file. 
This vmd command should work:
vmd result.pdb path.xyz 
where VMD reads the PDB file as a structure file and the path.xyz file for the coordinates of the trajectory. This will only work for the path.xyz file (not path_active.xyz) as the number of atoms in both PDB and trajectory files have to match.

Note that reading in these large files may take some time and also note that if result.pdb is chosen as PDB-file (a PDB file of the original coordinates could alternatively be used instead) then both first and last structures shown in VMD will correspond to the optimized coordinates.



Problems with geometry optimizations

- Sometimes there are problems with the HDLC coordinates system (residue error) and Cartesian coordinates must be used instead => coordinates=angstrom in the inputfile.



Adding/deleting atoms in QM/MM system

Adding new atom/residue to QM/MM system
When adding atoms to the QM/MM system the fragment file, lists and the PSF file needs updating.

Important things:
- Specify atom types for added atoms that make sense. Because the atom type determines the Lennard-Jones interaction between QM and MM regions. For small QM region calculations one can see artifacts.
- Specify residue names that makes sense (resgroups). Create a new one if required but look first in the topology file.


Deleting atom/residue in QM/MM system
When deleting atoms from the QM/MM system one needs to not only delete the atom from the fragment file but all lists need to be updated (as atom numbering changes) and the PSF file. Importantly previous active atoms and QM-atoms lists need to be updated. This is a more errorprone procedure than adding an atom.
Thus this procedure is best handled using the delatoms.chm script but with some caution.

The delatoms.chm script first needs to be manually edited where the atoms to be deleted are specified. Follow the instructions in the file carefully.
If one has an uninterrupted sequence of atoms to delete, e.g. ( 17 18 19 20) then those atoms can be deleted all at once. However, if that is not the case, e.g. (20 34) then each atom needs to be deleted separately by the script and best done starting from the highest number in the list. The reason for the latter is that the atom numbering gets updated otherwise (e.g. removing atom number 20 means that atom 34 becomes atom 33).

Then run the script like this:
chemshell.sh delatoms.chm fragmentfile.c xplor-psf-file.psf

Script will update fragment file (e.g. system.c), list-file (e.g. called save-new.chm) and act file (here called act). 
The QMatom information (e.g. in a file called qmatoms) needs to be updated manually.
 

Important things:
- delatoms script uses PSFgen program that requires an XPLOR-style PSF file as input. The XPLOR-style PSF file should have been created when the GROMACS QM/MM model was created. Script then outputs both CHARMM-style and XPLOR-style PSF-files.
- Make sure that you use the new XPLOR psf file that the script creates in your QM-MM optimizations. Remember that the QM-MM-opt.chm file should be updated to point to the new psf file.
- Note that the script updates the act file and list. 




Using Flip-spin broken-symmetry in QM/MM optimization (ORCA)

If you wish to converge to a broken-symmetry solution and optimize the geometry using that solution then the High-spin multiplicity and Flip-spin information needs to be handed over to ORCA during the first optimization step in Chemshell and then the spin multiplicity is set to the BS-state multiplicity from then on (ORCA will read the GBW file from the last optimization cycle from then on out so no more spin-flipping required).

To enable BS settings then the following information should be in the QM-MM-opt.chm inputfile.


#Setting charge and multiplicity of QM-region. If doing Broken-symmetry, then the multiplicity of the BS solution should be defined as mult, the high-spin multiplictiy is defined separately.
set charge -6
set mult 4
###################
#Special BS settings
####################
set brokensym yes
#Multiplicity of High-spin state and Broken-symmetry state. Will override $mult. Comment out if not using broken-symmetry.
set hsmult 34
set bsmult $mult

#Selecting which system atom numbers to flip (here Fe atoms: Fe2, Fe4 and Fe7 as defined in PDB/PSF)
#Will be converted to ORCA inputfile atom numbers by atomnumtoQMregionnum and then converted to comma-sep string.
set atomstoflip {17778 17780 17783}
set spinstofliplist [atomnumtoQMregionnum $qmatoms $atomstoflip]
set spinstoflip [join $spinstofliplist ","]
puts "spinstoflip is $spinstoflip and spinstofliplist is $spinstofliplist"


The ORCA theory command then needs to look like this:

           qm_theory=orca: [ list \
              executable=$orcapath/orca \
              brokensym=$brokensym \
              hsmult=$hsmult \
              bsmult=$bsmult \
              spinstoflip=$spinstoflip \
              charge=$charge \
              mult=$mult \
              orcasimpleinput= $orcasimpleinput \
              orcablocks= $orcablocks ] \



Adding extra basis set on specific QM atoms (ORCA)
    - Reqiures an ORCA interface file that supports this (orca-chemsh-withimage-withbs.tcl)
    - QM-MM-opt.chm file  (extrabasisatoms, extrabasisatomslist, extrabasisset variables and keywords)


To use, the following line should be specified in the QM-MM-opt.chm inputfile :

#13 aug 2016 new feature:
#Increased basis set on specific atoms in QM region.
#For example: FeMoco carbide. Use full-system atom numbers in list below (will be converted to ORCA QM atom number)
# Note: If no extrabasisatom is wanted, do: set extrabasisatoms {}    DO NOT COMMENT ANY LINE OUT
set extrabasisatoms {17779 17795 17796 17797 17798 17799 17800 17801 17802 17803 17804 17805 17806 17807 17808 17809 17810 17811 17812 17813 17814 17815 36987 36988 36989 36990}
#set extrabasisatoms {}
set extrabasisatomslist [atomnumtoQMregionnum $qmatoms $extrabasisatoms]
puts "extrabasisatomslist is $extrabasisatomslist"
set extrabasisset "ZORA-def2-TZVP"

extrabasisatomslist and extrabasisset Chemshell-ORCA keywords then need to be set inside qm_theory=orca inside the dl-find block in the Chemshell inputfile.


           qm_theory=orca: [ list \
              executable=$orcapath/orca \
              brokensym=$brokensym \
              hsmult=$hsmult \
              bsmult=$bsmult \
              spinstoflip=$spinstoflip \
              extrabasisatomslist= $extrabasisatomslist \
              extrabasisset=$extrabasisset \
              charge=$charge \
              mult=$mult \
              orcasimpleinput= $orcasimpleinput \
              orcablocks= $orcablocks ] \



Comments