Structures & Ensembles

Introduction

Within the CCPN Data Model, three-dimensional coordinate information for macromolecules is stored in the MolStructure package. This package contains a hierarchy of objects, as described below, which match equivalent objects in the MolSystem package; i.e. Chains, Residues and Atoms. Supplementing these are objects that represent the 3D coordinates and the alternate conformations of an ensemble. The objects involved are as follows:

 

Importing and Exporting Structures

Any structures present within a CCPN project will be saved along with that project, so no special attention is needed to write the information to disk. However, it is a common task to import a structure from a PDB formatted file, as held at the RCSB database. Also many programs read in PDB format files, and so it is useful to be able to export CCPN structure data to a PDB formatted file.

Firstly, to try the following examples we will load a CCPN project and find an NmrProject, remembering to change the location of the project directory to that it is appropriate to your system:

from memops.general.Io import loadProject

rootProject = loadProject('/home/user/myProjDirName'

nmrProject = rootProject.currentNmrProject

To load a structure (or ensemble of structures) from a PDB format file, you would first import a function that knows how to interpret PDB files:

from ccpnmr.analysis.core.StructureBasic import getStructureFromFile

Then, if required we make a new MolSystem to put the chains into (otherwise we could use an existing one if we know it matches the structure), and issue the command to load the structural information; this will fill the MolSystem creating all of the chains, residues and atoms as required:

molSystem = rootProject.findFirstMolSystem(code='GI')

fileName = "NameOfMyFile.pdb"

structure = getStructureFromFile(molSystem, fileName)

To save a structure as a PDB file we need to import and use a different function:

from ccpnmr.analysis.core.StructureBasic import makePdbFromStructure

fileName = "SomeNewFile.pdb"

makePdbFromStructure(fileName, structure)

Note that this method exports only the coordinate information, and not the full header information.

Chains, Residues, Atoms & Coordinates

Given a structure loaded into a CCPN project, we can begin to navigate around its data structure. Firstly load a structure, as above, then find a Model within the structure. If the structure only has one conformation there will be only one Model, i.e. one set of coordinates, but if the structure represents an ensemble of conformations them there will be many Models.

The following will find an arbitrary Model

model = structure.findFirstModel()

Or you can select a specific one:

models = structure.sortedModels()

model = models[2]

# or

model = structure.findFirstModel(serial=1)

Now with a Model in hand we can loop through all of the structures  chains, residues and atoms to print the coordinates which relate to that model. Note that we will go through the chains and residues in a sorted order. The printed chain and residue codes are kept in the MolStructure objects, but the ccpCode (or 3-letter code) is in the MolSystem, so we get it by following the link from the MolStructure residue to the MolSystem residue. Actually the link is not stored, but the object is found by the API when needed by following the links to the MolSystem and returning the Residue with the same chainCode and seqId as the MolStructure residue.

for chain in structure.sortedCoordChains():

  chainCode = chain.code

  for residue in chain.sortedResidues():

   sysResidue = residue.residue

    resName = '%d%s' % (residue.seqCode, sysResidue.ccpCode)

    for atom in residue.atoms:

     coord = atom.findFirstCoord(model=model)

      print chainCode, resName, atom.name,

     print '%.6f %.6f %.6f' % (coord.x, coord.y, coord.z)

To get one specific coordinate, we would need to know the route through chains, residue and atoms. For example to get the CA coordinate for the third residue of chain "A" in the set model. First get the structural chain, which in this case we will obtain using the MolSystem chain:

sysChain = molSystem.findFirstChain(code='A')

chain = structure.findFirstCoordChain(chain=sysChain)

Then get the residue, which could be the third in the sorted list:

residues = chain.sortedResidues()

residue = residues[2]

print residue.seqCode

Or a residue with a specific sequence number:

residue = chain.findFirstResidue(seqId=3)

Finally find the atom and the coordinate:

atomCa = residue.findFirstAtom(name='CA')

coordCa = atomCa.findFirstCoord(model=model)

By way of example, you could now get the backbone N atom of the next residue:

residue2 = chain.findFirstResidue(seqId=4)

atomN = residue2.findFirstAtom(name='N')

coordN = atomN.findFirstCoord(model=model)

And calculate the distance to the CA we already found:

from math import sqrt

dx = coordCa.x-coordN.x

dy = coordCa.y-coordN.y

dz = coordCa.z-coordN.z

dist = sqrt(dx*dx+dy*dy+dz*dz)

print "CA-N Distance: %.4f" % dist

 

High-Level Functions for Structures

With the coordinate information present in the CCPN data model you could write code to perform a vast array of functions, however some of the more common operations have already been written as CCPN functions. Firstly import the functions that will be used:

from ccpnmr.analysis.core.StructureBasic import alignStructures, makeEnsemble, getResiduePhiPsi

Here we use the function that takes an ensemble (a structure with multiple models) or a list of separate structures and aligns their coordinates with one another. This function gives back the aligned structure(s), the convergence error, the RMSD of each structure to the set and a dictionary of RMSD values for each atom.  This takes a list of structures so that mutiple structures can be aligned.

structures, error, structureRmsds, atomRmsdDict = alignStructures([structure])

print error, structureRmsds

If you have separate structures, but wish to combine them into a single structural ensemble, try:

ensemble = makeEnsemble(structures)

The final example is to get the average Phi and Psi backbone torsion (dihedral) angles for s specific structural residue:

phi, psi = getResiduePhiPsi(residue2, inDegrees=True)

print phi, psi

Getting Structure Data from Atomic Assignments

Finally, since CCPN is commonly dealing with the assignments of peaks to NMR resonances, and the assignment of those NMR resonances to atoms, there are several convenient high-level functions that get structural data from AtomSets. As mentioned in earlier sections, AtomSets are groupings of NMR-equivalent atoms, like the HB* methyl in an Alanine or just a plain protein backbone CA. The following CCPN functions allow you to easily get coordinates, distances and angles for AtomSets, and thus for any Resonances to which they may be assigned. This is something that is often useful in the interpretation of NOE spectra where resonances are typically within 5 Angstrom.

These functions are imported thus:

from ccpnmr.analysis.core.StructureBasic import getAtomSetCoords, getAtomSetsDihedral, getAtomSetsDistance

Note that if we were working from a completely new CCPN project we would have to perform a little bit more setup; if the NMR project we are working with has no equivalent AtomSets defined, then these must be made first. Remembering that these AtomSets are what resonances are assigned to, and thus what the above utility functions are designed to work with.

To get the coordinates for the atoms in an atom set, in a given structure (with the option to specify a given model):

# Get a methyl atom set

chain = molSystem.findFirstChain(code='A')

residue = chain.findFirstResidue(ccpCode='Ala')

atomSet = residue.findFirstAtom(name='HB1').atomSet

coords = getAtomSetCoords(atomSet, structure, model=None)

for coord in coords:

 print coord.x, coord.y, coord.z

To get the dihedral angle, in degrees, between a list of four atomSets:

residue = chain.findFirstResidue(seqCode=8)

atomSetH = residue.findFirstAtom(name='H').atomSet

atomSetN = residue.findFirstAtom(name='N').atomSet

atomSetCA = residue.findFirstAtom(name='CA').atomSet

atomSetCB = residue.findFirstAtom(name='CB').atomSet

atomSets = [atomSetH, atomSetN, atomSetCA, atomSetCB]

print getAtomSetsDihedral(atomSets, structure, model=model, inDegrees=True)

To get the distance between two groups of atomSets, where in this example the atomSets are in two sets which come from the assignment of two resonances:

resonance1 = nmrProject.findFirstResonance(serial=1)

resonance2 = nmrProject.findFirstResonance(serial=2)

atomSets1 = resonance1.resonanceSet.atomSets

atomSets2 = resonance2.resonanceSet.atomSets

getAtomSetsDistance(atomSets1, atomSets2, structure, method='noe')

Note that above we used the 'noe' method which calculates the equivalent distance in the NOE experiment by using the summation of distances raised to the power of -6.