Protein molecule simulation

Running simulations

Publications

Workflow templates

PHASE 1 – selecting and preparing the receptor

Step 1.1      Retrieving the receptor's structure (.pdb) file

receptor sources:

    • public repository (e.g.RCSB)
    • local database in Westminster
    • user machine

  Step 1.2      Preparation and standardisation of the receptor's structure (.pdb) file

  • USER
    • extracts the ATOM statements, which contain the coordinates of the receptor, from the structure (.pdb) file
    • selects one of the receptors from the structure (.pdb) file, if there are multiple ones, to be kept in the receptor's structure file
    • deletes the chain information from the ATOM statements  from the structure (.pdb) file(just for standardisation because all input files for the workflow should look the same).
    • finds a solution for NMR derived files, which contain an assembly of many slightly different coordinates.

   Step 1.3      Solvation of the receptor

  • USER adds a dummy CRYST1 statement to the receptor's structure (.pdb) file to set the coordinate system for solvation.
  • At this stage a Molprobity, King and Ramachandran plot are generated of the receptor molecule as it is at this point. Once these have been generated they will be visualisable in the PROSIM visualiser.
  • pdb2gmx further processes the receptor's structure (*.pdb) file. It creates a topology (.top) file, which contains the information (charge, bond length, angles etc.) for atoms included in the receptor's structure (.pdb) file, the extended structure (.gro) file and the include topology (.itp) file. It is essential that the topology (.top) file should contain the same number and brand of atoms which are defined in the receptor's structure (.pdb) file.
  • grompp generates the run input (.tpr) file, which contains all the information (structure & topology) required for Energy Minimalisation. The run input (.tpr) file is also necessary for adding ions to neutralise the receptor. The Energy Minimisation also needs the parameter input  (.mdp) file, which contains the information to guide the Energy Minimisation
  • editconf defines a water box (or volume) for solvation around the receptor changing the CRYST1 statement. The user can specify the distance between the solute and the box at the application configuration stage.
  • genbox merges a pre-equilibrated water box around the defined size of the receptor. The receptor's structure (.pdb) file is merged with the solvent's extended structure (.gro) file, which is inflated to the proper size, given in the CRYST1 statement of the receptor's structure (.pdb) file. Water molecules, which clash with the receptor, are removed.
     Step 1.4    Energy minimalisation and charge minimalisation of the receptor
  • grompp generates the run input (.tpr) file, which contains all the information (structure & topology) required for Energy Minimalisation. The run input (.tpr) file is also necessary for adding ions to neutralise the receptor. The Energy Minimisation also needs the parameter input  (.mdp) file, which contains the information to guide the Energy Minimisation.
  • genion adds the required number of positive (Na+) or negative (Cl-) ions using the information from the topology (.top) file. The update of the topology (.top) file is done automatically but the program asks to which molecule category the ions should be added. The script will automatically select the SOL group.
  • grompp generates the run input (.tpr) file, which contains all the information (structure & topology) required for Energy Minimalisation. The run input (.tpr) file is also necessary for adding ions to neutralise the receptor. The Energy Minimisation also needs the parameter input  (.mdp) file, which contains the information to guide the Energy Minimisation.
  • mdrun minimises the receptor's energy, maximum in 1000 steps, by removing strains from the structure.
  • The script will then filter out the solvent and Na+ ions.

    Step 1.5    Receptor validation

  • PROCHECK (or Molprobity) checks for Ramachandran outliers (residues with weird torsion angles of bonds in the protein main chain) and a test for the clash score (number of atoms which are too near to their neighbours).
  • At this stage a Molprobity, King and Ramachandran plot are generated of the receptor molecule as it is at this point. Once these have been generated they will be visualisable in the PROSIM visualiser.

        Remark:  If the results show deterioration, use the file without minimisation for the next steps.



PHASE 2 – selecting & preparing the ligand

Step 2.1      Retrieving the ligand's structure (.pdb) files

ligand sources:

    • public repository (for many of the potential ligands their structure files have been published e.g. for ligands of the opioid receptor)
    • local database in Westminster
    • user machine
    • building the structure from scratch using a molecule editor (e.g. as a Java applet) and a SMILES code translator.

 Step 2.2      Solvation and charge neutralisation of the ligand

  • At this stage a Molprobity, King and Ramachandran plot are generated of the ligand molecule as it is at this point. Once these have been generated they will be visualisable in the PROSIM visualiser.
  • prodrg creates a new structure (.pdb)  file and an include  topology (.itp) file for the perspective ligand using the of the receptor's topology (.top) file. The ligand’s include topology (.itp) file can be added  to the topology (.top) file of the receptor.
  • The script  adds a dummy CRYST1 statement to allow the coordinate setup.
  • editconf changes the coordinates to define a solvation volume. The user can specify the distance between the solute and the boxat the application configuration stage , e.g. 10 Å (-d 1.0) minimal distance to the receptor.
  • genbox merges the receptor with a pre-minimised solvent volume bringing it to the proper size.

Step 2.3    Energy minimalisation of the ligand

  • grompp creates the run (*.tpr) file needed for charge neutralisation and energy minimalisation. It generates an output of the net charge of the ????, which should be used in the GENION statement.
  • genion adds the required number of positive (Na+) or negative (Cl-) ions using the information from the topology (.top) file. The update of the topology (.top) file is done automatically but the program asks to which molecule category the ions should be added. The script will automatically select the SOL group.
  • grompp includes the counterions into the run input (.tpr) file.
  • mdrun mimises the ligand's energy. It extracts the small molecule lines for further use removing solvent and counterions coordinates. It also stores the ???? include topology (*.itp) file for later use in the molecule between receptor and ligand.
  • The script will then filter out the Na+ and Cl- ions.
  • At this stage a Molprobity, King and Ramachandran plot are generated of the ligand molecule as it is at this point. Once these have been generated they will be visualisable in the PROSIM visualiser.



PHASE 3 – docking ligand to receptor molecule

    Step 3.1   Preparing docking

  • pdb2gmx further processes the receptor's structure (*.pdb) file. It creates a topology (.top) file, which contains the information (charge, bond length, angles etc.) for atoms included in the receptor's structure (.pdb) file, the extended structure (.gro) file and the include topology (.itp) file. It is essential that the topology (.top) file should contain the same number and brand of atoms which are defined in the receptor's structure (.pdb) file.
  • editconf defines a water box (or volume) for solvation around the receptor changing the CRYST1 statement. The user can specify the distance between the solute and the box at the application configuration stage.
  • AutoDock's python script defines the docking parameters and the grid space (centre, number of grid points on each side and grid spacing) for constructing the 3D force field. A good starting point is 90 coordinate points in each space direction with a spacing of 0.375 Å (753571 points in a forcefield defining a cubus with a side length of 34,125 Å). The user can specify the spacing at the application configuration stage. 
  • AutoGrid defines the centre of the docking space giving the receptor and ligand file as outputs of the Phase 2. The user can specify either the  grid-center or instruct the script to use blind docking at the application configuration stage.
  • AutoGrid  computes the force fields for different atoms found in the ligand molecule and stores the force fields in map files..
         Step 3.2   Docking ligand & receptor
  • AutoDock calculates the optimum internal configuration of the ligand and its optimum position to the receptor by giving force fields as input parameters.
  • Remark: Definition of  the optimal technical configuration (fewer parallel runs with more trials in a run or vice versa) and how many trials should be run for the first approach (min 100 but more is better).
  • xxxx generates a graphical clustering output to allow assessment of the docking quality. (A mathematical algorithm should be considered to assess the docking quality).
  • The script selects the 10 best docking results as input for the Molecular Dynamics analysis in Phase 4.

     Note: It should be understood that the remainder of the workflow is performed in 10 threads, with each of the 10 best docking results of Phase 3.


    

PHASE 4 – Refining the ligand-receptor molecule

   Step 4.1   Solvation of the ligand-receptor molecule

  • At this stage a Molprobity, King and Ramachandran plot are generated of each of best molecules as they  are at this point. Once these have been generated they will be visualisable in the PROSIM visualiser.
  • pdb2gmx creates a topology (.top) file for the protein molecule using the receptor's structure (.pdb) file as input.
  • prodrg merges the receptor's structure (.pdb) file with each of the 10 ligands with minimal coordinates.
  • The script adds the ligand's include topology (*.itp) file to the topology (.top) file of the protein molecule.
  • grompp generates a run input (*.tpr) file to relax the system.
  • editconf defines the boundary condition (or the end volume of the solvate-solvent complex) for the protein molecule.
  • genbox merges the protein molecule with an inflated, pre-defined and energy minimised volume of solvent. The program removes clashing lipid or (solvent) molecules.
  • grompp generates a run input (*.tpr) file to relax the system
  • genion adds the required number of positive (Na+) or negative (Cl-) ions using the information from the topology (.top) file. The update of the topology (.top) file is done automatically but the program asks to which molecule category the ions should be added. The script will automatically select the SOL group.
  • grompp & mdrun : At this stage a sequence 4 grompp's and mdruns are performed. A description is needed as to what each mdrun is doing. (HH)
    • Note: Both before and after  the last of these mdrun's, a set of plots are generated, as are described above.

    Step 4.2   Energy minimalisation of the molecule


  • mdrun minimises the energy. First, 10 ps analysis is performed using a position restraint for the protein molecule. This allows the solvent molecule to move into gaps and cavities of the receptor file and the ions may associate with the protein molecule. Next, 100 ps analysis is performed to have the production-level molecular dynamics simulation.

    Step 4.3   Molecular Dynamics analysis

  • mdrun minimises the energy. First, 10 ps analysis is performed using a position restraint for the protein molecule. This allows the solvent molecule to move into gaps and cavities of the receptor file and the ions may associate with the protein molecule. Next, 100 ps analysis is performed to have the production-level molecular dynamics simulation.

    Step 4.4   Molecule trajectory data analysis

  • Generating a graphical output for the energy, temperature, and pressure of the protein molecule.
  • Downloading the compressed form of the trajectory files for further analysis.




Glossary

Amber (Assisted Model Building and Energy Refinement) is package of molecular simulation programs.

AutoDock & AutoGrid is a suite of automated docking tools to predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure. It consists of two main programs: AutoDock, which performs the docking of the ligand to a set of grids describing the target protein and AutoGrid, which pre-calculates these grids.

GROMACS is a versatile package to perform Molecular Dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins and lipids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the non-bonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.

GROMACS features:

·         GROMACS is user-friendly, with topologies and parameter files written in plain text format. There is a lot of consistency checking, and clear error messages are issued when something is wrong.

·         There is no scripting language. All programs use a simple interface with command line options for input and output files. There is also an integrated graphical user interface available for all programs.

·         Both run input files and trajectories are independent of hardware and can be read by any GROMACS version,

·         GROMACS can write coordinates using compression, which provides a very compact way of storing trajectory data. The accuracy can be selected by the user.

·         GROMACS comes with a large selection of flexible tools for trajectory. The output is further provided in the form of finished Xmgr/Grace graphs, with axis labels, legends, etc. already in place.

·         A basic trajectory viewer that only requires standard X libraries is included, and several external visualization tools can read the GROMACS file formats.

·         GROMACS can be run in parallel, using standard MPI communication.

·         The package includes a fully automated topology builder for proteins. Building blocks are available for the 20 standard aminoacid residues as well as some modified ones, the 4 nucleotide and 4 deoxinucleotide resides, several sugars and lipids, and some special groups like hemes and several small molecules.

GROMACS utilities:

editconf       - converts generic structure format to .gro and .pdb format, and

                   - defines the boundary condition for the molecule, i.e. creates the box where the molecule

genbox        - solvates the molecule, i.e. removes the solvent molecule from the box, where the distance between any atom of the molecule and the solvent molecules is less than a pre-defined value,

                   - it needs as input the molecule in a box, the solvent box, the topology of the solute and of the solvent,

                   - it updates the topology including the topology file of the water and adding the number of water molecule,

genion         - replaces solvent molecules by monoatomic ions at the position of the first atoms with the most favorable electrostatic potential or at random.

grompp       - reads a molecular topology file and check its validity,

                   - expands the topology from molecular description to an atomic description,

                   - reads the input parameter, the topology and structure file for Energy Minimalisation and/or Molecular Dynamics analysis and combines them into a single run input file

mdrun         - removes strains, which due to the generation of the hydrogens, and bad Van der Waals contacts may exist, caused by particles that are too close, from the structure through Energy Minimalisation,

                   - create a trajectory file, a structure file of the minimized structure and an energy file with all the energies for every step

pdb2qmx    - reads a structure file, adds hydrogens to the molecule and generates coordinate and topology file, which can be processed by grompp,

                   - checks every residue in the structure file against a database and adds all hydrogens,

                   - generates three files, the topology file, the extended structure file and the include topology file, which is created automatically together with the topology.

prodrg         - takes a structure file and converts it into different topology formats for organic molecules,

GROMACS files:

.gro             - extended structure file, which contains the same information as the basic structure file regarding the positions of the atoms, but the layout is different because hydrogens have been added and units have been converted to nm

.itp              - include topology file of individual components of the topology

.mdp           - simulation parameter input file, which defines the parameters for running the simulation, including time step, type of simulation, electrostatics, Van der Waals, temperature coupling, pressure coupling, etc

.pdb            - structure file of positions of atoms in a molecular structure where coordinates are read from the ATOM and HETATM records, until the file ends or an ENDMDL record is encountered,

.rtp              - residue topology file of default interaction type for the 4 bonded interactions and residue entries, which consist of atoms and optionally bonds, angles, dihedrals and improper dihedrals

.top             - system topology file with information on the atom names, types, masses and charges, as well as a description of bonds, angles, dihedrals, etc.

.tpr              - run input file generated by grompp from .gro, .mdp and .top files

.trr              - trajectory file containing data about atomic positions, velocities, and/or forces