Structures & Ensembles
Introduction
Within the CCPN Data Model, three-dimensional coordinate information for macromolecules is stored in the MolStructure package. This package contains a hierarchy of objects, as described below, which match equivalent objects in the MolSystem package; i.e. Chains, Residues and Atoms. Supplementing these are objects that represent the 3D coordinates and the alternate conformations of an ensemble. The objects involved are as follows:
StructureEnsemble - This is the 'top' object of the package and contains all of the other objects. A StructureEnsemble represents what you would find in the body of a PDB file; It is a set of atomic coordinates that go together to describe the conformation of a molecule. The StructureEnsemble might contain several models, which represent alternative conformations, but these are always for the same set of atoms. A StructureEnsemble belongs to a MemopsRoot object (a CCPN project) and is identified by its ensembleId number AND a MolSystem. Thus it is mandatory to specify which MolSystem a StructureEnsemble relates to, when it is created and it is only the atoms of this MolSystem that the StructureEnsemble may specify coordinates for. Each StructureEnsemble has its own XML file in which it is stored, so that you can load structures independently from one another.
Chain -This object belongs to a StructureEnsemble and is a container for the structure's Residues. Such a MolStructure.Chain always links to an equivalent MolSystem.Chain. However, there need not be a structural chain for every molecular system chain; for example you may have a system representing a protein:DNA complex but only have structural information for the protein chain.
Residue - A MolStructure.Residue belongs to a MolStructure.Chain and acts as a container for structural atoms. Like the chains that contain it, a structural residue may be missing, even if a residue is present in the MolSystem. In this way a structure can represent only part of a molecular system, e.g. only the well ordered residues.
Atom -These structural atoms belong to the structure's residues, and link to equivalent atoms in the ensemble's MolSystem. An atom may contain several coordinate records which specify the atom's position in different conformations, and in this case each of the coordinates represent a different model.
Coord -This is the coordinate record which belongs to a given atom in a given model. It specifies an atomic location in cartesian (x,y,z) coordinates and carries records of B-factor etc.
Model -The Model object belongs to the top-level StructureEnsemble object and serves to group coordinates together which relate to the same molecular conformation. Sometimes there will only be one model and one set of coordinates.
Importing and Exporting Structures
Any structures present within a CCPN project will be saved along with that project, so no special attention is needed to write the information to disk. However, it is a common task to import a structure from a PDB formatted file, as held at the RCSB database. Also many programs read in PDB format files, and so it is useful to be able to export CCPN structure data to a PDB formatted file.
Firstly, to try the following examples we will load a CCPN project and find an NmrProject, remembering to change the location of the project directory to that it is appropriate to your system:
from memops.general.Io import loadProject
rootProject = loadProject('/home/user/myProjDirName'
nmrProject = rootProject.currentNmrProject
To load a structure (or ensemble of structures) from a PDB format file, you would first import a function that knows how to interpret PDB files:
from ccpnmr.analysis.core.StructureBasic import getStructureFromFile
Then, if required we make a new MolSystem to put the chains into (otherwise we could use an existing one if we know it matches the structure), and issue the command to load the structural information; this will fill the MolSystem creating all of the chains, residues and atoms as required:
molSystem = rootProject.findFirstMolSystem(code='GI')
fileName = "NameOfMyFile.pdb"
structure = getStructureFromFile(molSystem, fileName)
To save a structure as a PDB file we need to import and use a different function:
from ccpnmr.analysis.core.StructureBasic import makePdbFromStructure
fileName = "SomeNewFile.pdb"
makePdbFromStructure(fileName, structure)
Note that this method exports only the coordinate information, and not the full header information.
Chains, Residues, Atoms & Coordinates
Given a structure loaded into a CCPN project, we can begin to navigate around its data structure. Firstly load a structure, as above, then find a Model within the structure. If the structure only has one conformation there will be only one Model, i.e. one set of coordinates, but if the structure represents an ensemble of conformations them there will be many Models.
The following will find an arbitrary Model
model = structure.findFirstModel()
Or you can select a specific one:
models = structure.sortedModels()
model = models[2]
# or
model = structure.findFirstModel(serial=1)
Now with a Model in hand we can loop through all of the structures chains, residues and atoms to print the coordinates which relate to that model. Note that we will go through the chains and residues in a sorted order. The printed chain and residue codes are kept in the MolStructure objects, but the ccpCode (or 3-letter code) is in the MolSystem, so we get it by following the link from the MolStructure residue to the MolSystem residue. Actually the link is not stored, but the object is found by the API when needed by following the links to the MolSystem and returning the Residue with the same chainCode and seqId as the MolStructure residue.
for chain in structure.sortedCoordChains():
chainCode = chain.code
for residue in chain.sortedResidues():
sysResidue = residue.residue
resName = '%d%s' % (residue.seqCode, sysResidue.ccpCode)
for atom in residue.atoms:
coord = atom.findFirstCoord(model=model)
print chainCode, resName, atom.name,
print '%.6f %.6f %.6f' % (coord.x, coord.y, coord.z)
To get one specific coordinate, we would need to know the route through chains, residue and atoms. For example to get the CA coordinate for the third residue of chain "A" in the set model. First get the structural chain, which in this case we will obtain using the MolSystem chain:
sysChain = molSystem.findFirstChain(code='A')
chain = structure.findFirstCoordChain(chain=sysChain)
Then get the residue, which could be the third in the sorted list:
residues = chain.sortedResidues()
residue = residues[2]
print residue.seqCode
Or a residue with a specific sequence number:
residue = chain.findFirstResidue(seqId=3)
Finally find the atom and the coordinate:
atomCa = residue.findFirstAtom(name='CA')
coordCa = atomCa.findFirstCoord(model=model)
By way of example, you could now get the backbone N atom of the next residue:
residue2 = chain.findFirstResidue(seqId=4)
atomN = residue2.findFirstAtom(name='N')
coordN = atomN.findFirstCoord(model=model)
And calculate the distance to the CA we already found:
from math import sqrt
dx = coordCa.x-coordN.x
dy = coordCa.y-coordN.y
dz = coordCa.z-coordN.z
dist = sqrt(dx*dx+dy*dy+dz*dz)
print "CA-N Distance: %.4f" % dist
High-Level Functions for Structures
With the coordinate information present in the CCPN data model you could write code to perform a vast array of functions, however some of the more common operations have already been written as CCPN functions. Firstly import the functions that will be used:
from ccpnmr.analysis.core.StructureBasic import alignStructures, makeEnsemble, getResiduePhiPsi
Here we use the function that takes an ensemble (a structure with multiple models) or a list of separate structures and aligns their coordinates with one another. This function gives back the aligned structure(s), the convergence error, the RMSD of each structure to the set and a dictionary of RMSD values for each atom. This takes a list of structures so that mutiple structures can be aligned.
structures, error, structureRmsds, atomRmsdDict = alignStructures([structure])
print error, structureRmsds
If you have separate structures, but wish to combine them into a single structural ensemble, try:
ensemble = makeEnsemble(structures)
The final example is to get the average Phi and Psi backbone torsion (dihedral) angles for s specific structural residue:
phi, psi = getResiduePhiPsi(residue2, inDegrees=True)
print phi, psi
Getting Structure Data from Atomic Assignments
Finally, since CCPN is commonly dealing with the assignments of peaks to NMR resonances, and the assignment of those NMR resonances to atoms, there are several convenient high-level functions that get structural data from AtomSets. As mentioned in earlier sections, AtomSets are groupings of NMR-equivalent atoms, like the HB* methyl in an Alanine or just a plain protein backbone CA. The following CCPN functions allow you to easily get coordinates, distances and angles for AtomSets, and thus for any Resonances to which they may be assigned. This is something that is often useful in the interpretation of NOE spectra where resonances are typically within 5 Angstrom.
These functions are imported thus:
from ccpnmr.analysis.core.StructureBasic import getAtomSetCoords, getAtomSetsDihedral, getAtomSetsDistance
Note that if we were working from a completely new CCPN project we would have to perform a little bit more setup; if the NMR project we are working with has no equivalent AtomSets defined, then these must be made first. Remembering that these AtomSets are what resonances are assigned to, and thus what the above utility functions are designed to work with.
To get the coordinates for the atoms in an atom set, in a given structure (with the option to specify a given model):
# Get a methyl atom set
chain = molSystem.findFirstChain(code='A')
residue = chain.findFirstResidue(ccpCode='Ala')
atomSet = residue.findFirstAtom(name='HB1').atomSet
coords = getAtomSetCoords(atomSet, structure, model=None)
for coord in coords:
print coord.x, coord.y, coord.z
To get the dihedral angle, in degrees, between a list of four atomSets:
residue = chain.findFirstResidue(seqCode=8)
atomSetH = residue.findFirstAtom(name='H').atomSet
atomSetN = residue.findFirstAtom(name='N').atomSet
atomSetCA = residue.findFirstAtom(name='CA').atomSet
atomSetCB = residue.findFirstAtom(name='CB').atomSet
atomSets = [atomSetH, atomSetN, atomSetCA, atomSetCB]
print getAtomSetsDihedral(atomSets, structure, model=model, inDegrees=True)
To get the distance between two groups of atomSets, where in this example the atomSets are in two sets which come from the assignment of two resonances:
resonance1 = nmrProject.findFirstResonance(serial=1)
resonance2 = nmrProject.findFirstResonance(serial=2)
atomSets1 = resonance1.resonanceSet.atomSets
atomSets2 = resonance2.resonanceSet.atomSets
getAtomSetsDistance(atomSets1, atomSets2, structure, method='noe')
Note that above we used the 'noe' method which calculates the equivalent distance in the NOE experiment by using the summation of distances raised to the power of -6.