PDB Files

Using a PDB file to set up a system for simulation

Protein data bank (PDB) files are almost universally employed in the biomolecular simulation field. Unfortunately, using PDB files can be a painful experience. Reasons for this include:

    • Many PDB files do not adhere to the PDB standard.

    • Much information that is necessary when setting up a system for simulation is absent from a PDB file.

    • The experimentally-determined structures of proteins and other biomolecules often have gaps or uncertainties. Thus, atoms may be absent (e.g. hydrogens) or may be present several times (e.g. due to the occurrence of multiple conformations for flexible parts of the molecule).

Devising a fully automatic scheme for setting up a simulation given a "well-behaved" PDB file and a "simple" system is relatively straightforward. However, most cases are likely to require manual intervention of some sort by the user.

A procedure that is followed when setting up a protein system for simulation with pDynamo in the most general (difficult) case is listed below. The order of some of the steps (particularly 4, 5 and 6) is variable and will depend on the system being studied.

All the files employed in the different steps of this tutorial are available in the examples/pdbFiles subdirectory of the pDynamo distribution.

Note that this tutorial does not address the important issue of how to determine the accessible protonation states of the system within the PDB file. This can conveniently be done with the pcetk extension to pDynamo.

Preliminaries

Select a PDB file that is appropriate for the objectives of the simulation study.

Step 1

Read the PDB file and write out the PDB model that it contains to a file.

Step 2

Edit the PDB model file so that it conforms to the system that is to be simulated. It is often more reliable to set up different parts of a system separately, in which case the model file from Step 1 is split up into several pieces.

Step 3

Create a system from the edited PDB model file and the original PDB file. The model file, along with the PDB component library supplied with pDynamo, is used to generate the system's atoms and bonds whereas the original PDB file provides atomic coordinates.

Step 3a

If there are errors in generating the system, it is most likely because not all the components, links and variants that appear in the PDB model have been defined. If not, these must be added and Step 3 repeated.

Step 4

Generate an MM model for the system.

Step 4a

Errors in Step 4 are common due to the absence of force field parameters that are needed to describe various groups in the system. If this is the case, these need to be added to the force field definition and Step 4 repeated. To see how to do this, check out the files in the data/pkaProtein subdirectory of the tutorial.

Step 5

Form the complete system by merging together its constituent parts if these were set up separately.

Step 6

Check to see if the system has atoms whose coordinates are undefined or, in other words, that were absent in the original PDB file. If so, these must be constructed.

Step 7

Relax the structure of the vacuum system.

Step 8

Solvate the system using an appropriate solvent — normally water along with some counterions.

Step 9

Refine and equilibrate the structure of the solvated system in preparation for subsequent simulation.