MM and QM/MM setup

Do note that while the GROMACS and Chemshell protocols discussed on this page proved highly valuable for the studies on nitrogenase our research group has now switched completely over to using our own open-source program ASH that together with the OpenMM library and ORCA allows much more convenient MM and QM/MM setups for metalloproteins.

See ASH main page

See ASH metalloprotein-tutorial I: rubredoxin

See ASH metalloprotein-tutorial II: ferredoxin

MM and QM/MM setup for a metalloprotein using GROMACS, Chemshell and ORCA

(Using the CHARMM forcefield (both in GROMACS and with Chemshell) and the ORCA code as QM code for Chemshell.)

This page contains technical details regarding the GROMACS MM setup and QM/MM setup in Chemshell as performed in our QM/MM studies on nitrogenase:

QM/MM Study of the Nitrogenase MoFe Protein Resting State: Broken-Symmetry States, Protonation States, and QM Region Convergence in the FeMoco Active Site, Bardi Benediktsson and Ragnar Bjornsson, Inorg. Chem. 2017, 56, 13417-13429.
QM/MM calculations reveal a bridging hydroxo group in a vanadium nitrogenase crystal structure, Bardi Benediktsson, Albert T. Thorhallsson, Ragnar Bjornsson, Chem. Comm., 2018, 54, 7310-7313.
Resolving the structure of the E1 state of Mo Nitrogenase through Mo and Fe K-edge EXAFS and QM/MM calculations, C. Van Stappen, A. Th. Thorhallsson, L. Decamps, R. Bjornsson, S. DeBeer, Chem. Sci. 2019, 10, 9807-9821.
A model for dinitrogen binding in the E4 state of nitrogenase, Albert Th. Thorhallsson, Bardi Benediktsson, Ragnar Bjornsson, Chem. Sci. 2019, 10, 11110-11124.
QM/MM Study of Resting State Vanadium Nitrogenase: Molecular and Electronic Structure of the Iron-Vanadium Cofactor B. Benediktsson, R. Bjornsson*, Inorg. Chem. 2020, 59, 11514-11527.
The E2 state of FeMoco: hydride formation vs. Fe reduction and a mechanism for H2 evolution A. Th. Thorhallsson, R. Bjornsson, Chem. Eur. J. 2021, 27, 16788-16800.
Carbon Monoxide Binding to the Iron-Molybdenum Cofactor of Nitrogenase: A Detailed Quantum Mechanics/Molecular Mechanics Investigation N. B. Spiller, R. Bjornsson, S. DeBeer, F. Neese, Inorg. Chem. 2021, 60, 18031-18047.
Analysis of the geometric and electronic structure of spin-coupled iron-sulfur dimers with broken-symmetry DFT: implications for FeMoco B. Benediktsson, R. Bjornsson, J. Chem. Theory Comput. 2022, 18, 1437-1457..

Scripts to perform GROMACS MM setup and MM simulations for a metalloprotein and subsequent Chemshell QM/MM calculations are described on this page and available here:

Github page.

The CHARMM36 forcefield is used throughout. Topology and parameter files were downloaded from Alexander McKerell's website both in GROMACS and CHARMM formats and modified.

If you find these scripts useful, please consider citing e.g. article 1 above.

GROMACS setup

Our MM model setup with GROMACS is loosely based on the tutorial written by Justin Lemkul (recommended reading):

http://www.mdtutorials.com/gmx/lysozyme/index.html

Before setting up an MM model in GROMACS of a metalloprotein, a basic topology and parameters for the metal cofactors is required, usually not available in protein forcefields. In the case of nitrogenase, only nonbonded MM parameters were defined. It's a good idea to look at the literature for available topologies/parameters for exotic species.

Atom charges for FeMoco can be derived by a population analysis from a QM calculation. In our case, Natural population analysis charges from a BP86 DFT calculation of the cofactor (described by a polarizable continuum) were used. These charges are added to the merged.rtp inside the charmm36.ff GROMACS directory (see Github page). The values of the charges is not critical as they are not used at all in the QM/MM step (assuming electrostatic embedding QM/MM) if they will be in the QM-region. They are used in the MM optimization/MD steps, however and need to be sensible (otherwise severe artifacts from the MM prep will arise). The charges should add up to the correct integer charge of the group.

The FeMoco entry (ICS residue name) inside merged.rtp:

[ ICS ] ; The name of the residue, in this case ICS is FeMoco of Mo-nitrogenase.

[ atoms ] ; The first column is the atom name (should match PDB atom entry), the second column is the atom type (will determine LJ parameters), the third column is the charge (NOTE, total charges should add up ot the desired integer number) and the fourth column is the chargegroup).

MO1 MO 0.3861000 0

FE1 FE 0.9370699 1

FE2 FE 0.65790667 2

FE3 FE 0.65790667 3

FE4 FE 0.65790667 4

FE5 FE 0.62767667 5

FE6 FE 0.62767667 6

FE7 FE 0.62767667 7

S1 SM -0.5904233 8

S2 SM -0.5904233 9

S3 SM -0.5904233 10

S4 SM -0.4311467 11

S5 SM -0.4311467 12

S6 SM -0.4311467 13

S7 SM -0.4733233 14

S8 SM -0.4733233 15

S9 SM -0.4733233 16

CX C -1.69524 17

The atom type must exist in your parameter file (which is probably not the case if you are working with a protein with a transition metal center) and preferably have at least non-bonded parameters. For iron, you would need to add a line in the MASS section:

MASS 91 FE 55.84700 FE ! iron

Where ’MASS’ indicates that this is the mass section, ’91’ is the entry number in the list, ’FE’ is the designated atomtype, ’55.84700’ is the molar mass, and ’! iron’ is the comment line

Before starting the GROMACS step, it is essential to visualize the protein and have a good look at it. You should check for residues that have multiple conformations (delete as appropriate), check the protonation state of titrable amino acid residues and check whether it would be logical to flip e.g. HIS residues that have both nitrogen and carbon in its functional group 180°. Flipping of Histidine is easily performed by reordering the coordinates in the PDB file of atoms: ND1, CE1, NE2 and CD2.

If you have a e.g. cysteine bound to a metal ion (common for iron-sulfur clusters) then that cysteine is very likely deprotonated and then you should select a deprotonated residue (modify the PDB file e.g. change CYS to CSD). The CSD residue definition is part of the CHARMM36 forcefield.

If the -his, -glu, -lys, -asp flags (the -his flag is the most important) has been selected then GROMACS will initially ask for the desired protonation site on His (HSE for NE2 protonation, HSD for ND1 protonation, HSP for double protonation). The desired protonation site should be selected. While pKA prediction codes can be used for this task, we prefer visual inspection to check for hydrogen bond patterns that usually make the protonation site easily determined, and also allows one to see if the His residues should be flipped.

Our MM protocol consists of six steps:

1. Manual modification of the PDB file:

a) Deletion of extra conformations in residues. Often labelled as A and B. Delete one of them and rename.

b) Deletion of undesired species (ions, contamination etc.)

c) Renaming of residues and atoms to match the forcefield.

d) Addition of hydrogens or other missing atoms for cofactors or other non-protein species. This is only needed for non-protein species as GROMACS can automatically add hydrogens of all protein residues.

2. PDB to GROMACS conversion (this is the main headache)

Conversion of the unprotonated protein data bank (PDB) structure into a protonated structure. You can specify whether a titrable amino acid residue should be protonated or not during this step. The CHARMM36.ff forcefield is selected here (that may or may not be present in your version of CHARMM). To make the CHARMM36 forcefield (that will be modified) available to GROMACS you should put the downloaded charmm36.ff forcefield inside a directory and specify that directory by the environment variable: e.g. export GMXLIB=/home/bjornsson/gromacs-forcefields

You may also want to copy over other Gromacs forcefield files to this same directory (that GMXLIB points to) such as residuetypes.dat that you may have to modify. The GROMACS forcefield files are in /path/to/gromacs-installation/share/gromacs/top.

If you use non-standard amino acid residues such as deprotonated cysteines (CSD residue in CHARMM36) then the residuetypes.dat file needs to contain the entry (otherwise gmx pdb2gmx command will complain):

CSD Protein

The GROMACS command should be run e.g. like this:

gmx pdb2gmx -f ProteinDataBankFile.pdb -lys -glu -his -o Protein-processed.pdb -ff charmm36 -water tip3

It is helpful go through this step once manually, inspect visually each residue (at least the HIS residues) and compile the list of protonation states. A list of HIS protonation states can then be compiled:

0 0 1 1 0 0 0 0 2 0 0 1 (where 0 means a HSD protonation state (ND1 proton), 1 means a HSE protonation state (NE2 proton) and HSP means a doubly protonated state (ND1 and NE2 protons).

This list can then be fed into the gmx program next time:

echo 0 0 1 1 0 0 0 0 2 0 0 1 | gmx pdb2gmx -f ProteinDataBankFile.pdb -lys -glu -his -o Protein-processed.pdb -ff charmm36 -water tip3

The hydrogenated protein file (Protein-processed.pdb) should then be visually inspected (e.g. VMD).

Note: if you have non-standard terminii (other than N and O), e.g. acetyl capping groups (ACE) then you have to account for this. Adding the "-ter" to the gmx command above means that GROMACS will ask what it should do with the start terminus in terms of protonation state. For an acetyl group, choose "None".

3. Next, there is the creation of a solvation box around the protein and defining the periodic boundary conditions.

gmx editconf -f Protein-processed.pdb -o Protein-boxed.pdb -c -d 1.0 -bt cubic

4. Then the box is filled with solvent.

gmx solvate -cp Protein-boxed.pdb -cs spc216.gro -o Protein-solvated.pdb -p topol.top –n index.ndx

5. As the protein is likely charged at this stage, excess charge is neutralized by the generation of counter-ions (e.g. Na+ and/or Cl- ) .

gmx grompp -f ions.mdp -c Protein-solvated.pdb -p topol.top –n index.ndx -o ions.tpr

Where the file ions.mdp file contains the run parameters.

Secondly, the generated binary file, ions.tpr, is used to create the charge neutral system.

gmx genion -s ions.tpr -o Protein-ionated.pdb -p topol.top -pname NA -nname CL -np 40

6. Relaxation of the system. Metallocofactors are not relaxed, unless forcefield parameters exist and atoms directly in contact with the metallocofactor are constrained as well. Thus contraints are usually applied by modifying the MDP file e.g. like this:

freezegrps = Protein-H Cofactor ; Defining which groups are restrained. Here protein-heavy-atoms are constrained and all atoms part of Group cofactor.

freezedim = Y Y Y Y Y Y ; restrained in X Y Z dimensions, one Y for each dimension of each group to restrain

It's a good idea to first relax only the hydrogen atoms of the protein and water in the system, while keeping metallocofactor groups completely frozen (including their hydrogen atoms). Later, the protein heavy atoms can be relaxed as well (if desired).

To define specific groups then an index file has to first be created:

gmx_d make_ndx -f xraymodel-solvionated.pdb -o index.ndx

The index file can then be either manually updated or using the tool:

gmx_d make_ndx -f xraymodel-solvionated.pdb -n index.ndx

The following command sets up the minimization:

gmx grompp -f minim.mdp -c Protein-ionated.pdb -p topol.top -o hmin.tpr -n index.ndx

Where the file minim.mdp contains the run parameters.

The minimization is then started (can often be run directly, without job submission):

gmx mdrun -ntomp 1 -ntmpi 1 -v -c Protein-relaxed.pdb -deffnm minim

If there are problems during minimization then this can often solved by visualizing the optimization trajectory. As these are periodic simulations it is usually best to create a new compress trajectory file where additionally solvent motion across boxes has been smoothed out:

gmx trjconv -s minim.tpr -f minim.trr -o minim_noPBC.xtc -pbc mol -center

One can then open the trajectory in VMD like this:

vmd hmin.gro minim_noPBC.xtc

7. A molecular simulation of the protein in solution. MD is performed for a period of e.g. 5 ns using the canonical ensemble (NVT) and at a random timepoint, the geometries are extracted and are used for preparation of the QM/MM model.

NOTE: Often freezing specific water molecules might be necessary, e.g. water molecules near the cofactor. While freezing only the oxygen-atoms might be tempting this is not always possible in GROMACS and probably all 3 atoms of the water molecules probably have to be frozen.

gmx grompp -f nvt.mdp -c Protein-relaxed.pdb -n index -p topol.top -o nvt.tpr

Where the file nvt.mdp contains the run parameters.

To start the MD:

gmx mdrun -v -deffnm nvt -c Protein-nvt.pdb

The MD simulation should then be analyzed carefully (e.g. temperature, RMSD etc.) to check that everything is order.

Analyze RMSD: gmx_d rms -s nvt.tpr -f nvt_noPBC.xtc -o rmsd.xvg -tu ns

Analyze Temperature: gmx_d energy -f nvt.edr -o temperature.xvg

Write compressed trajectory file (accounting for PBC effects): gmx trjconv -s nvt.tpr -f nvt.trr -o nvt_noPBC.xtc -pbc mol -center

Visualize trajectory in VMD: vmd hmin.gro nvt_noPBC.xtc

A classical MD simulation is necessary prior to any QM/MM study. This allows one to check that all force field parameters are in order, that there are no close contacts between atoms etc. It is up to the user to decide, however, what sorts of constraints should be applied to the system during the MD step. If the protein contains multiple exotic groups such as metal cofactors (that usually need to be constrained) then it may be the case that a classical MD simulation is going to perturb the system too much and introduce artifacts prior to the QM/MM. In those cases, constraining the entire protein and cofactors at the X-ray crystal structure position may be a better choice while allowing only the solvent (and ions) to move.

If the MD simulation looks OK a snapshot (here after 5000 ps) can be extracted:

gmx_d trjconv -s nvt.tpr -f nvt.trr -dump 5000 -o snap-5000ps-diff.pdb

Setting up QM/MM model in Chemshell using the CHARMM forcefield

Once the MM(CHARMM) model is set up (using GROMACS protocol like above or using another program), pre-optimized and/or equilibrated, a specific structure can be chosen for QM/MM modelling in Chemshell.

This requires: converting the final snapshot PDB file to a Chemshell fragment file , creation of a protein-structure file (PSF) file (when using CHARMM force field) and creation of various lists. This is handled by scripts described below.

1. Optional: cut sphericalmodel out of periodic box

It depends on the system whether this should be done or not. If the system is overly large then cutting a 30-40K atom spherical model centered on the active site of interest is a good idea and can be accomplished using VMD. Chemshell will also behave better for systems with number of atoms < 40 K.

If you make a spherical model, make sure to do intelligent cuts by residue. You also may want to include enough counterions for the final model to be neutral.

2. Cut PDB file into separate segments.

This is necessary for the PSFgen script below.

Currently this step is performed via a rather slow shellscript but it works quite well. The script requires cofactor residues to be identified.

OLD VERSION: ./Chemshell-protein-setup.sh

NEW VERSION: ./psfcreate.sh

Modify the first lines of the script for your system.

#Note the number for the last protein segment and the last sol segment.

3. Create PSF file using PSFgen file

First PSFgen needs to be installed: http://www.ks.uiuc.edu/Research/vmd/plugins/psfgen/

The psfgen-template.tcl file then should be downloaded.

Also you need a CHARMM-style topology file that needs to contain the same information on residues and atoms as the modified files of GROMACS did.

Here is the modified nitrogenase topology file: top_all36_prot.rtf as an example.

An example FeMoco topology looks like this in the top_all36_prot.rtf :

!RB adding ICS

! FeMoco -1 charge

! Fe1 is edge Fe. Fe2-Fe4 are Fe cubane triangle, Fe5-Fe7 are Mo triangle. S1-S3 are Fe cubane sulfur, S4-S6 are Mo cubane sulfur, S7-S9 are bridging-cubane sulfurs.

RESI ICS -1.00 !

GROUP

ATOM MO1 MO 0.38

ATOM FE1 FE 0.94

ATOM FE2 FE 0.66

ATOM FE3 FE 0.66

ATOM FE4 FE 0.66

ATOM FE5 FE 0.63

ATOM FE6 FE 0.63

ATOM FE7 FE 0.63

ATOM S1 SM -0.59

ATOM S2 SM -0.59

ATOM S3 SM -0.59

ATOM S4 SM -0.43

ATOM S5 SM -0.43

ATOM S6 SM -0.43

ATOM S7 SM -0.47

ATOM S8 SM -0.47

ATOM S9 SM -0.47

ATOM CM C -1.72

The Fe, MO, SM atom types need to be defined in the parameter file in the nonbonded section (Lennard-Jones). Here Lennard-Jones parameters need to be defined.

Their masses also need definition at the top of the topology and the parameter file.

If appropriate values for these atom types are not known (and no good model system can be found) then the parameters can be set to 0 (but usually better to set to similar values as related chemical groups). This is fine if these atoms will not be engaging in short-range QM and MM interactions, this may be the case if the QM region is set large enough.

Run the program:

psfgen psfgen.tcl >& psfgen.out

Carefully check psfgen.out output for errors

You may get warnings about badly guessed coordinates for the peptide-chain terminii. The peptide-chain terminii should be manually checked in the final PDB file.

Final PDB file: new.pdb

PSF-files: newxplor.psf and new.psf. The "newxplor.psf" file is recommended to use.

The PSF files will be used by Chemshell.

4. Create necessary system lists for Chemshell (Python script)

./chemshell-listprep.py

Running this script will create a file save-new.chm that will contain the following lists:

groups (a list of lists all the CHARMM residue charge groups)

charges ( a list of all the atom charges)

pdbresidues (a list of lists of all the residues and the atom numbers of each residue)

residuegroups (a list of lists of the atoms in each residue)

types (a list of all the atom types of the system)

The save-new.chm file is later sourced by Chemshell during QM/MM calculation.

Setting up QM/MM model in Chemshell using the Amber forcefield (set up via Ambertools)

If an MM model has been set up using the Amber software, pre-optimized and/or equilibrated, a specific structure can be chosen for QM/MM modelling in Chemshell.

This requires: converting the Amber coordinate file to a Chemshell fragment file and the creation of various lists. This is handled by scripts described below.

1. Optional: cut sphericalmodel out of periodic box

If you make a spherical model, make sure to do intelligent cuts by residue. You also may want to include enough counterions for the final model to be neutral.

Get the Amber coordinate file (file.inpcrd and the Amber topology file file.prmtop) for the system.

Currently also requires an XPLOR-style PSF file to be created to for defining lists below. Will be changed.

2. Set up various lists for the system.

Download the file amber-chemshell-prep.chm, the listprep-foramber.py script and the modified parse_amber.tcl file . Run the amber-chemshell-prep script:

chemshell.sh amber-chemshell-prep.chm

This Chemshell script needs to be pointed to the Amber inpcrd file, the prmtop file and a PSF-file by setting variables at the top of script.

The script outputs system.c which is a Chemshell fragment file containing the coordinates and the correct connectivity.

It also outputs the save-new.chm file containing various residues lists (made by Python script). The listprep-foramber.py script needs to be present and a modified parse_amber.tcl file (source by script).

Reason for lists in save-new.chm:

residuegroups: Needed for active region selection in actregiondefine.chm script (see below).

pdbresidues : Needed for HDLC residues used in HDLC QM/MM optimization.

types: List of atom types. Needed for constraints (and for CHARMM interface).

The save-new.chm file just created is later sourced by Chemshell during QM/MM calculation.

Create QM/MM inputfile (for use with ORCA and the Amber forcefield)

The QM-MM-opt-amber.chm inputfile is then created and looks like this.

#Name of fragment file

set frag system.c

#Chemshell-scripts location.

#This dir should contain the update ORCA interface (orca-chemsh-withimage-withbs.tcl), the procs.tcl file (various useful functions), the topology and parameter files etc.

set scriptdir /home/bjornsson/QM-MM-Chemshell-scripts/

#Putting fragment file in memory.

fragment $frag old persistent

set numatoms [get_number_of_atoms coords=$frag]

puts "Number of atoms is $numatoms"

#Sourcing various TCl procs

source $scriptdir/procs.tcl

#ORCA path and sourcing of updated ORCA interface.

set orcapath /opt/orca_4.2.0

source $scriptdir/orca-chemsh-withimage-withbs.tcl

#Sourcing file containing pdbresidues and residuegroups

source save-new.chm

#Defining Amber topology file

set ambertop 1mxr_solv.prmtop

# sourcing active and frozen lists. Act list should have been previously created by a script, for example actregiondefine.chm.

source act

puts "Active region is [llength $act] atoms."

#Frozen list defined based on act list

set all [seq 1 to $numatoms]

set frozen [listcomp $all $act]

puts "Frozen region is [llength $frozen] atoms."

# QM REGION atoms, charge and multiplicity. qmatoms file should have been previously created (manually or by script).

source qmatoms

#Checking qmatoms list for duplicates.

checkQMregion $qmatoms

checkQMregion $act

puts "There are [llength $qmatoms] QM atoms and they are $qmatoms"

#Setting charge and multiplicity of the QM region.

set charge 1

set mult 2

###################

#Special BS settings

####################

set brokensym yes

#Multiplicity of High-spin state and Broken-symmetry state. Will override $mult. Comment out if not using broken-symmetry.

set hsmult 10

set bsmult $mult

#Selecting which system atom numbers to flip.

#Will be converted to ORCA inputfile atom numbers by atomnumtoQMregionnum and then converted to comma-sep string.

#247

set atomstoflip {5510}

set spinstofliplist [atomnumtoQMregionnum $qmatoms $atomstoflip]

set spinstoflip [join $spinstofliplist ","]

puts "spinstoflip is $spinstoflip and spinstofliplist is $spinstofliplist"

##################

# ORCA Theory level in simple input line

set orcasimpleinput "! b97-3c"

# ORCA block settings

set orcablocks "

%maxcore 2000

%basis

end

%pal

nprocs 1

end

###################################################################

#Setting up X-H and H-H constraints (TIP3) for optimization. Set jobtype to md for MD constraints

set jobtype opt

# To correctly find the TIP3P water the oxygen atom type is defined below. e.g. OT or OW

set waterOtype "OW"

source $scriptdir/constraints-onlytip3.tcl

# Setting mxlist

set mxlist 38000

puts "mxlist is $mxlist"

# Optimisation

dl-find \

list_option=full coords=$frag active_atoms= $act constraints= $con maxcycle=1000 \

coordinates=hdlc residues= $pdbresidues maxstep=0.5 result=result.c \

theory=hybrid : [ list \

coupling=shift debug=no qm_region= $qmatoms conn=$frag \

qm_theory=orca: [ list \

executable=$orcapath/orca \

brokensym=$brokensym \

hsmult=$hsmult \

bsmult=$bsmult \

spinstoflip=$spinstoflip \

charge=$charge \

mult=$mult \

orcasimpleinput= $orcasimpleinput \

orcablocks= $orcablocks ] \

mm_theory=dl_poly : [ list \

frozen= $frozen \

conn= $frag \

debug=no \

use_pairlist=no \

exact_srf=yes \

mxlist= $mxlist \

mxexcl=500 \

cutoff=1000 \

scale14 = { 1.0 1.0 } \

amber_prmtop_file=$ambertop ]]

write_xyz coords=result.c file=result.xyz

times

Create QM/MM inputfile (for use with ORCA and the CHARMM forcefield)

The QM-MM-opt-charmm.chm inputfile is then created and looks like this.

#Name of fragment file

set frag system.c

#Chemshell-scripts location. This dir should contain the update ORCA interface (orca-chemsh-withimage-withbs.tcl), the procs.tcl file (various useful functions), the topology and parameter files etc.

set scriptdir /home/username/Chemshell-scripts

#Putting fragment file in memory.

fragment $frag old persistent

set numatoms [get_number_of_atoms coords=$frag]

puts "Number of atoms is $numatoms"

# Read in file (should be in same directory) for lists. This sets up lists $charges, $groups, $types, $pdbresidues and $residuegroups

source save-new.chm

#PSF file from PSFgen

set psffile new-XPLOR-psffile.psf

#Loading connectivity from PSF file

load_connect_from_psf $frag $psffile

#Sourcing various TCl procs

source $scriptdir/procs.tcl

# Topology

#Here using modified CHARMM36 files. SHould

set topmass $scriptdir/top_all36_prot.rtf

set charmmpar $scriptdir/par_all36_prot.prm

#ORCA path and sourcing of updated ORCA interface.

set orcapath /home/user/orca

source $scriptdir/orca-chemsh-withimage-withbs.tcl

# sourcing active and frozen lists. Act list should have been previously created by a script, for example regiondefine.chm.

source act

puts "Active region is [llength $act] atoms."

#Frozen list defined based on act list

set all [seq 1 to $numatoms]

set frozen [listcomp $all $act]

puts "Frozen region is [llength $frozen] atoms."

# QM REGION atoms, charge and multiplicity. qmatoms file should have been previously created (manually or by script).

source qmatoms

#Checking qmatoms list for duplicates.

checkQMregion $qmatoms

puts "There are [llength $qmatoms] QM atoms and they are $qmatoms"

#Setting charge and multiplicity.

set charge -4

set mult 2

##################

# ORCA Theory level in simple input line

set orcasimpleinput "! TPSSh ZORA ZORA-def2-SVP SARC/J RIJCOSX D3BJ Grid5 FinalGrid6 tightscf slowconv"

# ORCA block settings

set orcablocks "

%maxcore 2000

%basis

NewGTO Fe \"ZORA-def2-TZVP\" end

NewGTO S \"ZORA-def2-TZVP\" end

NewGTO Mo \"old-ZORA-TZVP\" end

addGTO Mo

F 1

1 0.6554500000 1.0000000000

end

%pal

nprocs 12

end

###################################################################

#Setting up X-H and H-H constraints (TIP3) for optimization. Set jobtype to md for MD constraints

set jobtype opt

source $scriptdir/constraints-onlytip3.tcl

# Setting mxlist

set mxlist 38000

puts "mxlist is $mxlist"

# Optimisation

dl-find \

list_option=full coords=$frag active_atoms= $act constraints= $con maxcycle=1000 \

coordinates=hdlc residues= $pdbresidues maxstep=0.5 result=result.c \

theory=hybrid : [ list \

coupling=shift debug=no atom_charges= $charges qm_region= $qmatoms conn=$frag \

qm_theory=orca: [ list \

executable=$orcapath/orca \

charge=$charge \

mult=$mult \

orcasimpleinput= $orcasimpleinput \

orcablocks= $orcablocks ] \

mm_theory=dl_poly : [ list \

frozen= $frozen \

conn= $frag \

debug=no \

use_pairlist=no \

exact_srf=yes \

mxlist= $mxlist \

mxexcl=500 \

cutoff=1000 \

scale14 = { 1.0 1.0 } \

use_charmm_psf=yes \

charmm_psf_file=$psffile \

atom_types= $types \

charmm_parameter_file=$charmmpar \

charmm_mass_file= $topmass ]]

write_xyz coords=result.c file=result.xyz

times

ORCA interface

We use a heavily modified ORCA interface that has to be sourced for the above inputfile to work. The ORCA inputfile (created by Chemshell) is defined via the "orcasimpleinput" and "orcablocks" variables that are then passed on to the updated ORCA interface. This differs completely from the regular ORCA interface in Chemshell.

The new ORCA interface is available at Github page as "orca-chemsh-withimage-withbs.tcl"

Define QM regions and active regions

Define QM region

The QM region is best defined via a simple script that creates a list (qmatoms) that is stored in a file (qmatoms) that is sourced by the Chemshell inputfile.

This is a useful way of documenting the sub-lists of atoms that define the residues.

qmregiondefine.chm:

#Chemshell-script location

set scriptdir /home/user/Chemshell-scripts

source $scriptdir/procs.tcl

####################################################

# SETTING UP QM REGION

###################################################

# QM REGION atom groups defined (system numbers)

set femoco [seq 17776 to 17793]

set hca [seq 17809 to 17829]

set imz [seq 6913 to 6923]

set cys {4176 4177 4178 4179}

set HZIN {37067 37068 37069 37070 37071 37072 37073}

#HZIN and extra proton

#Groups defined

set qmatoms [concat $femoco $hca $imz $cys $HZIN]

#QM region checked for duplicates

checkQMregion $qmatoms

set qmatomsfile [ open qmatoms w]

puts $qmatomsfile "set qmatoms {$qmatoms}"

close $qmatomsfile

puts "There are [llength $qmatoms] QM atoms. Written to file qmatoms."

puts "QM atoms: $qmatoms"

Once the QM-region is defined it can then be checked by visual inspection by using the script qmedit.sh. The script will create a file called qmregioncoords.xyz that contains the QM region coordinates (in Bohr units). The coordinates can be copy-pasted into e.g. Chemcraft to check whether the QM region looks like expected (note that link atoms will be added by Chemshell later to terminate dangling bonds).

To change the coordinates of the QM region the geometry can be modified in Chemcraft and the updated coordinates in Bohr units can be copy-pasted back into the file qmrecoords.xyz. To update the coordinates in the fragment file (e.g. called system.c), the qmregionupdate.py script is run (requires Python):

qmregionupdate.py system.c (requires qmregioncoords.xyz file to be present)

Once the QM-region is defined once like above it is usually more convenient to simply copy the qmatoms file between different directories for different calculations of the same system where the QM regions stays the same and the system does not change with respect to number of atoms.

Define active region

The active region is best defined once for the system using a script and the list is then best stored in a file (act) that is copied to every Chemshell-job file using the QM/MM setup. The active region should ideally not be updated as the QM/MM energy function depends on what atoms are active and which are frozen. It is best to define a good active region once (~1000 atoms) and then keep using the same active region throughout the project. However, if atoms are added/deleted to the the system then the act list needs to be updated carefully.

The following script shows how to conveniently create an approximately spherical active region around the active site

actregiondefine.chm:

#############

# ONLY USE IF STARTING NEW SYSTEM AND MISSING ACT FILE

##############

if { $argv == "" } {

puts "\033\[31mNo fragment file given.\033\[0m Do chemshell.bash actregiondefine.chm fragmentfile.c"

exit

}

#Chemshell-script location

set scriptdir /home/ragnarbj/QM-MM/nitrogenase/Chemshell-QM-MM-scripts

#Putting fragment file in memory.

set frag $argv

fragment $frag old persistent

#Colored output

#puts "Why not \033\[34mGooge\033\[0m first ?"

#puts "Why not \033\[34mG\033\[31mo\033\[33mo\033\[34mg\033\[32ml\033\[31me\033\[0m first ?"

source $scriptdir/procs.tcl

##################################################################

# SETTING UP ACTIVE ATOMS, FROZEN ATOMS AND CONSTRAINTS

##################################################################

source save-new.chm

# Set origin atom (atom in the middle of active region). Here carbon of FeMoco

set origin 17793

#Radius of active region (will select whole residues).

set radius 11

#Freezing special residues/atoms by atom numbers.

#Use seq function select range: Example: set frozgr1 [seq 17730 to 17829]

# or define list explictly: set frozgr1 { 17730 17731 17732 etc} .

# Combine by concat to list $extrafrozen (will be added to $frozen by select.tcl)

# IMIA, FE2P:

set frozgr1 [seq 17730 to 17775 ]

# P-cluster:

set frozgr2 [seq 17794 to 17808]

# These are counterions:

set frozgr3 [seq 37042 to 37066 ]

# These are cysteine sulfur atoms in P-cluster:

set frozgr4 {951 2350 9021 9901 8663 1323}

set extrafrozen [concat $frozgr1 $frozgr2 $frozgr3 $frozgr4 ]

# Calling Tcl code that sets up lists $act and $frozen using $origin, $radius and the frozen lists above.

# act and frozen lists are also written to disk

puts "There are [get_number_of_atoms coords=$frag] atoms."

source $scriptdir/select.tcl

delete_object $frag

Analyzing the result of QM/MM optimizations

Once the DL-FIND geometry optimization is done, the final system geometry is usually stored in a Chemshell fragment file called result.c (coordinates in Bohr). This file can also be written as an XMOL XYZ file result.xyz in Angstrom units (currently done using the write_xyz function at the bottom of the inputfile above). Often it may be mainly of interest to inspect the QM region coordinates. They can be found either by creating a new qmregioncoords.xyz file using:

qmedit.sh result.c and then visualize the QM region coordinates in the newly created qmregioncoords.xyz, e.g. by copy-pasting into Chemcraft.

Alternatively, orca1.inp and orca1.out files are possibly copied back from your jobscript and they will contain the QM region coordinates from the last optimization cycle from Chemshell.

Trajectory files are also useful for visualizing how the geometry optimization proceeded. The path_active.xyz is an XYZ trajectory file that only contains the coordinates of the active region (usually ~1000 atoms). The path.xyz is an XYZ trajectory file for the entire system (much larger size of this file and it may take a while to load). These files are convenient to open in the VMD visualization program but an unfortunate aspect is that XYZ files only contain element information, no residue information at all (convenient for selecting the interesting parts of the system).

To create a PDB file of the final structure one can use the proper-PDBwrite.chm Chemshell script (see Github page):

chemshell-intel.sh proper-PDBwrite.chm result.c new-XPLOR-psffile.psf

This will create a result.pdb file that can be visualized in VMD with residue information available (active site can e.g. be more easily visualized in the Graphical Representations window: resname cofactorresiduename ).

It's also possible to use this PDB file when visualizing the trajectory file.

This vmd command should work:

vmd result.pdb path.xyz

where VMD reads the PDB file as a structure file and the path.xyz file for the coordinates of the trajectory. This will only work for the path.xyz file (not path_active.xyz) as the number of atoms in both PDB and trajectory files have to match.

Note that reading in these large files may take some time and also note that if result.pdb is chosen as PDB-file (a PDB file of the original coordinates could alternatively be used instead) then both first and last structures shown in VMD will correspond to the optimized coordinates.

Problems with geometry optimizations

- Sometimes there are problems with the HDLC coordinates system (residue error) and Cartesian coordinates must be used instead => coordinates=angstrom in the inputfile.

Adding/deleting atoms in QM/MM system

Adding new atom/residue to QM/MM system

When adding atoms to the QM/MM system the fragment file, lists and the PSF file needs updating.

It is best to not do this manually but instead by using the addatoms.chm script.

See Github script page.

Important things:

- Specify atom types for added atoms that make sense. Because the atom type determines the Lennard-Jones interaction between QM and MM regions. For small QM region calculations one can see artifacts.

- Specify residue names that makes sense (resgroups). Create a new one if required but look first in the topology file.

1. Copy addatoms.chm script to your directory containing the fragment-file you want to modify as well as the other important files of the system (PSF-file, save-new.chm file, act, qmatoms etc.)

2. Modify the relevant lines in the script.

3. Run script like this:

chemshell.sh addatoms.chm fragmentfile.c xplor-psf-file.psf

4. Carefully read the warnings/errors you get when you run the script.

5. Confirm that the atom/atoms have been added to the fragment file and that the list have been updated. Make sure to use these files in future calculations of this new system-setup.

Deleting atom/residue in QM/MM system

When deleting atoms from the QM/MM system one needs to not only delete the atom from the fragment file but all lists need to be updated (as atom numbering changes) and the PSF file. Importantly previous active atoms and QM-atoms lists need to be updated. This is a more errorprone procedure than adding an atom.

Thus this procedure is best handled using the delatoms.chm script but with some caution.

The delatoms.chm script first needs to be manually edited where the atoms to be deleted are specified. Follow the instructions in the file carefully.

If one has an uninterrupted sequence of atoms to delete, e.g. ( 17 18 19 20) then those atoms can be deleted all at once. However, if that is not the case, e.g. (20 34) then each atom needs to be deleted separately by the script and best done starting from the highest number in the list. The reason for the latter is that the atom numbering gets updated otherwise (e.g. removing atom number 20 means that atom 34 becomes atom 33).

Then run the script like this:

chemshell.sh delatoms.chm fragmentfile.c xplor-psf-file.psf

Script will update fragment file (e.g. system.c), list-file (e.g. called save-new.chm) and act file (here called act).

The QMatom information (e.g. in a file called qmatoms) needs to be updated manually.

Important things:

- delatoms script uses PSFgen program that requires an XPLOR-style PSF file as input. The XPLOR-style PSF file should have been created when the GROMACS QM/MM model was created. Script then outputs both CHARMM-style and XPLOR-style PSF-files.

- Make sure that you use the new XPLOR psf file that the script creates in your QM-MM optimizations. Remember that the QM-MM-opt.chm file should be updated to point to the new psf file.

- Note that the script updates the act file and list.

Mutating atom/residue in QM/MM system

If one wants to change an atom into another atom or a chemical group into another (e.g. an SH group into a CO group) this can be accomplished by use of the delatoms.chm and addatoms.chm scripts above by first deleting and then adding.

Another option is possible if the system does not change size during the mutation (e.g. changing S to Se or SH to CO).

In this case it is possible to do the mutation manually by simply changing the files manually:

fragmentfile.c #Change elements here

save-new.chm # Change charge in charges list, change type in types list (this affects the Lennard-Jones interaction and should be done carefully).

xplor-psf-file.psf # Change element, atomtype, mass, etc.

Don't forget to change the charge/mult of the system if that changes during the mutation.

Also don't forget to adjust the coordinates using qmedit.sh/qmregionupdate.py scripts.

Using Flip-spin broken-symmetry in QM/MM optimization (ORCA)

If you wish to converge to a broken-symmetry solution and optimize the geometry using that solution then the High-spin multiplicity and Flip-spin information needs to be handed over to ORCA during the first optimization step in Chemshell and then the spin multiplicity is set to the BS-state multiplicity from then on (ORCA will read the GBW file from the last optimization cycle from then on out so no more spin-flipping required).

To enable BS settings then the following information should be in the QM-MM-opt.chm inputfile.

#Setting charge and multiplicity of QM-region. If doing Broken-symmetry, then the multiplicity of the BS solution should be defined as mult, the high-spin multiplictiy is defined separately.

set charge -6

set mult 4

###################

#Special BS settings

####################

set brokensym yes

#Multiplicity of High-spin state and Broken-symmetry state. Will override $mult. Comment out if not using broken-symmetry.

set hsmult 34

set bsmult $mult

#Selecting which system atom numbers to flip (here Fe atoms: Fe2, Fe4 and Fe7 as defined in PDB/PSF)

#Will be converted to ORCA inputfile atom numbers by atomnumtoQMregionnum and then converted to comma-sep string.

set atomstoflip {17778 17780 17783}

set spinstofliplist [atomnumtoQMregionnum $qmatoms $atomstoflip]

set spinstoflip [join $spinstofliplist ","]

puts "spinstoflip is $spinstoflip and spinstofliplist is $spinstofliplist"

The ORCA theory command then needs to look like this:

qm_theory=orca: [ list \

executable=$orcapath/orca \

brokensym=$brokensym \

hsmult=$hsmult \

bsmult=$bsmult \

spinstoflip=$spinstoflip \

charge=$charge \

mult=$mult \

orcasimpleinput= $orcasimpleinput \

orcablocks= $orcablocks ] \

Restarting a QM/MM optimization

Regular DFT-job

If you want to restart a geometry optimization, e.g. from a failed optimization or a crashed optimization then typically you would take the result.c file (containing the coordinates), rename it to system.c and then restart the job. This will start a new optimization from those coordinates. If the system does not have a complicated electronic structure then this is typically the easiest recommended way. If the electronic structure is complicated, then restarting from the previous orbitals may also be worthwhile (see also Broken-symmetry job below).

- In that case, if the last set of orbitals (from the last optimization step of the previous failed optimization) is available (i.e. copied back from the scratch), usually called orca1.gbw then rename this file to save.gbw.

- Then add restart=true to the qm_theory part. Settin restart=true means that the ORCA-interface will look for a file called "save.gbw" to read orbitals from.

- Make sure the jobscript copies save.gbw to the scratch-dir of the calculation.

Broken-symmetry job

For a complicated broken-symmetry electronic structure it is usually worth it to start from the broken-symmetry orbitals instead of going through the (often expensive) spin-flipping procedure.

- set brokensym equal to no in the inputfile. Any hsmult, bsmult, sloppyHS, sloppyTol and spinstoflip keywords should then be ignored by the ORCA interface (they can still be kept as part of the dl-find command). Only mult (the final spin multiplicity of the BS solution) will be used in the generation of the ORCA inputfile by Chemshell.

- Then add restart=true to the qm_theory=orca so that the ORCA-interface will try to read a GBW-file (instead of generating a guess). The interface searches for a GBW file called save.gbw and thus the GBW file you want needs to be called save.gbw and the jobscript needs to copy this file over (should be the case).

Important: Do not add any MOREAD/moinp keywords yourself to the orca-simpleinput and block keywords. That will mess things up. The ORCA interface completely handles reading of orbitals.

Adding extra basis set on specific QM atoms (ORCA)

- Reqiures an ORCA interface file that supports this (orca-chemsh-withimage-withbs.tcl)

- QM-MM-opt.chm file (extrabasisatoms, extrabasisatomslist, extrabasisset variables and keywords)

To use, the following line should be specified in the QM-MM-opt.chm inputfile :

#13 aug 2016 new feature:

#Increased basis set on specific atoms in QM region.

#For example: FeMoco carbide. Use full-system atom numbers in list below (will be converted to ORCA QM atom number)

# Note: If no extrabasisatom is wanted, do: set extrabasisatoms {} DO NOT COMMENT ANY LINE OUT

set extrabasisatoms {17779 17795 17796 17797 17798 17799 17800 17801 17802 17803 17804 17805 17806 17807 17808 17809 17810 17811 17812 17813 17814 17815 36987 36988 36989 36990}

#set extrabasisatoms {}

set extrabasisatomslist [atomnumtoQMregionnum $qmatoms $extrabasisatoms]

puts "extrabasisatomslist is $extrabasisatomslist"

set extrabasisset "ZORA-def2-TZVP"

extrabasisatomslist and extrabasisset Chemshell-ORCA keywords then need to be set inside qm_theory=orca inside the dl-find block in the Chemshell inputfile.

qm_theory=orca: [ list \

executable=$orcapath/orca \

brokensym=$brokensym \

hsmult=$hsmult \

bsmult=$bsmult \

spinstoflip=$spinstoflip \

extrabasisatomslist= $extrabasisatomslist \

extrabasisset=$extrabasisset \

charge=$charge \

mult=$mult \

orcasimpleinput= $orcasimpleinput \

orcablocks= $orcablocks ] \