TUTORIAL
MD simulation of the sars-CoV-2 Main Protease (monomer)
MD simulation of the sars-CoV-2 Main Protease (monomer)
This is a basic tutorial to run a molecular dynamics simulation for the SARS-CoV-2 Main Protease using PDB ID: 6LU7.
Software: GROMACS 2022
Hardware: iMac: macOS Monterey, 2.3 GHz Dual-Core Intel Core i5, 16 GB 2133 MHz DDR4
Simulation Time: 1 ns
Firstly, we need to download the protein that we are going to use first. In this tutorial, we are using 6LU7. The protein structure can be downloaded at www.rcsb.org/ by searching "6LU7" and download the protein in PDB format.
When the protein is downloaded, we can view the protein structure and visualize it using viewing software like VMD and PyMol.
Next, we need to eliminate the crystal waters by removing the water molecules "HOH". To do so, we can use Notepad++ to delete it manually or by entering the command:
> grep -v HOH 6lu7.pdb > 6LU7_clean.pdb
Once entering the command, the output file which is "6LU7_clean.pdb" will have no water molecules "HOH". This can be checked using Notepad++ or the vi command.
Next, the file is ready to be input into the first GROMACS module, pdb2gmx. The purpose of pdb2gmx is to generate three files which are the topology for the molecule, a position restraint file and a post-processed structure file.
The command to execute pdb2gmx is:
> gmx pdb2gmx -f 6LU7_clean.pdb -o 6LU7_processed.gro -water spce
The result will require us to choose a force field:
Choose number 15 (OPLS-AA) and press enter.
The output from the previous command is topol.top. We can view and examine its contents using Notepad++
We will find the following;
; Include forcefield parameters
#include "oplsaa.ff/forcefield.itp"
[ moleculetype ]
; Name nrexcl
Protein_chain_A 3
[ atoms ]
; Include Position restraint file
#ifdef POSRES
#include "posre.itp"
#endif
; Include water topology
#include "oplsaa.ff/spce.itp"
#ifdef POSRES_WATER
; Position restraint for each water oxygen
[ position_restraints ]
; i funct fcx fcy fcz
1 1 1000 1000 1000
#endif
; Include topology for ions
#include "oplsaa.ff/ions.itp"
[ system ]
; Name
3C-LIKE PROTEINASE; 6 NON-STRUCTURAL PROTEIN 5,NSP5,SARS CORONAVIRUS MAIN PROTEINASE; N-[(5- METHYLISOXAZOL-3-YL)CARBONYL]ALANYL-L-VALYL-N~1~-; 11 ((1R,2Z)-4-(BENZYLOXY)-4-OXO-1-{[(3R)-2- OXOPYRROLIDIN-3-; 12 YL]METHYL}BUT-2-ENYL)-L-LEUCINAMIDE in water
[ molecules ]
; Compound #mols
Protein_chain_A 1
Next, we are going to be simulating a simple aqueous system. We need to define the box dimensions using the editconf module while using a simple cubic box as the unit cell. We can define the box using editconf with the command:
> gmx editconf -f 6LU7_processed.gro -o 6LU7_newbox.gro -c -d 1.0 -bt cubic
As seen in the command, it centers the protein in the box (-c), and places it at least 1.0 nm from the box edge (-d 1.0). The box type is defined as a cube (-bt cubic).
Next, we need to fill it with solvent (water) with the command:
> gmx solvate -cp 6LU7_newbox.gro -cs spc216.gro -o 6LU7_solv.gro -p topol.top
In the topol.top, there will be changes like so; (where solvation is added)
[ molecules ]
; Compound #mols
Protein_chain_A 1
SOL 28002
The previous step made us achieved a solvated system that contains a charged protein. The tool for adding ions called genion where it reads through the topology and replace water molecules with the ions that the user specifies.
The input is called a run input file, which has an extension of .tpr.
Grompp process the coordinate file and topology to generate .tpr. The .tpr file contains all the parameters for all of the atoms in the system.
To produce a .tpr file with grompp, we will need an additional input file, with the extension .mdp which can be downloaded from here www.mdtutorials.com/gmx/lysozyme/Files/ions.mdp (mdp file)
.tpr file can be assembled using the command:
> gmx grompp -f ions.mdp -c 6LU7_solv.gro -p topol.top -o ions.tpr
Next, we are going to pass this file to genion with the command:
> gmx genion -s ions.tpr -o 6LU7_solv_ions.gro -p topol.top -pname NA -nname CL -neutral
The result will require us to choose a group:
Select group 13 "SOL" for embedding ions.
Our [ molecules ] directive should now look like:
[ molecules ]
Compound #mols
Protein_chain_A 1
SOL 28002
NA 4
The process for EM is much like the addition of ions, we will run the energy minimization through the GROMACS MD engine, mdrun.
Assemble the binary input using grompp using the file that can be downloaded here: www.mdtutorials.com/gmx/lysozyme/Files/minim.mdp
Then, the command that we need to run is:
> gmx grompp -f minim.mdp -c 6LU7_solv_ions.gro -p topol.top -o em.tpr
The next command to invoke mdrun to carry out the EM:
> gmx mdrun -v -deffnm em
As a result, the ouput file that we will gain are em.log, em.edr, em.trr and em.gro.
Next, we can analyze any .edr file using the GROMACS energy module:
> gmx energy -f em.edr -o potential.xvg
At the prompt, type "10 0" to select Potential (10); zero (0) terminates input. We will be shown the average of Epot, and a file called "potential.xvg" will be written.
To plot this data, you will need the Xmgrace plotting tool (can be access through Ubuntu with the command "DISPLAY=:0.0 xmgrace" once xmgrace is downloaded.
Now, we are going to begin real dynamics.
In terms of geometry and solvent orientation, EM assured that we had a good starting structure. We must first equilibrate the solvent and ions around the protein before we can begin real dynamics. At this stage, attempting unrestricted dynamics may cause the system to collapse.
We are going to use posre.itp file that pdb2gmx generated in the previous step. Posre.itp apply a position restraining force on the heavy atoms of the protein.
We will conduct equilibrition under an NVT ensemble (constant Number of particles, Volume, and Temperature).
50-100 ps should suffice.
The nvt.mdp file that we are going to use can be accessed here: www.mdtutorials.com/gmx/lysozyme/Files/nvt.mdp
Next, we will run the command and call grompp and mdrun just as we did at the EM step:
> gmx grompp -f nvt.mdp -c em.gro -r em.gro -p topol.top -o nvt.tpr
> gmx mdrun -deffnm nvt
Next, we are going to analyze the temperature progression, using the command:
> gmx energy -f nvt.edr -o temperature.xvg
The result will require us to choose a selection:
Type "16 0" (choose temperature)
Previously, we have stabilized the temperature of the system using NVT equilibrition.
Now, we have to stabilize the pressure and the density of the system.
Equilibration of pressure is conducted under an NPT ensemble, wherein the Number of particles, Pressure, and Temperature are all constant.
The .mdp file (100-ps NPT equilibration) can be downloaded from here: www.mdtutorials.com/gmx/lysozyme/Files/npt.mdp
Next, we need to run the command:
> gmx grompp -f npt.mdp -c nvt.gro -r nvt.gro -t nvt.cpt -p topol.top -o npt.tpr
> gmx mdrun -deffnm npt
After that, we will analyze the pressure progression, using the command:
> gmx energy -f npt.edr -o pressure.xvg
The result will require us to choose a selection:
Type "18 0" (choose pressure)
Next, we will analyze the density progression, using the command:
> gmx energy -f npt.edr -o density.xvg
The result will require us to choose a selection:
Type "24 0" (choose density)
The system is now well-equilibrated at the appropriate temperature and pressure after completing the two equilibration phases. We're finally ready to release the position restrictions and start collecting data with production MD.
We will run MD Simulation, with the script that can be downloaded here: www.mdtutorials.com/gmx/lysozyme/Files/md.mdp
This script contains "nsteps = 500000 ; 2 * 500000 = 1000 ps (1 ns)" which is used for 1ns
We will run the MD simulation using the command:
> gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -o md_0_1.tpr
Next, we need to execute the mdrun by using the command:
> gmx mdrun -deffnm md_0_1
BASIC MD ANALYSIS
We should now do the analysis after completing the simulation.
trjconv, is used as a post-processing tool to strip out coordinates, correct for periodicity, or manually alter the trajectory.
The protein will diffuse through the unit cell, and may appear "broken" or may "jump" across to the other side of the box. To fix this, we can use the command:
> gmx trjconv -s md_0_1.tpr -f md_0_1.xtc -o md_0_1_noPBC.xtc -pbc mol -center
The result will require us to choose a selection:
Select 1 ("Protein") as the group to be centered and 0 ("System") for output.
We can also generate RMSD using the command:
> gmx rms -s md_0_1.tpr -f md_0_1_noPBC.xtc -o rmsd.xvg -tu ns
The result will require us to choose a selection:
Choose 4 ("Backbone") for both the least-squares fit and the group for RMSD calculation
If we wish to calculate RMSD relative to the crystal structure, we could do so by using the command:
> gmx rms -s em.tpr -f md_0_1_noPBC.xtc -o rmsd_xtal.xvg -tu ns
choose the same - 4 ("Backbone") for both the least-squares fit and the group )
When plotting together, the results will look like below;
Lastly, we can analyze the radius of gyration. The radius of gyration of a protein is a measure of its compactness.
We can analyze the radius of gyration using the command:
> gmx gyrate -s md_0_1.tpr -f md_0_1_noPBC.xtc -o gyrate.xvg
The result will require us to choose a selection:
Choose group 1 (Protein) for analysis.