Molecular Description

Introduction

Within the CCPN data model there are several packages that deal with molecular information. Here we will focus on the following which are commonly encountered when dealing with NMR information.

Chemical Elements

The descriptions of the chemical elements are stored as ChemElement objects under an umberella ChemElementStore object. There could in theory be multiple ChemElementStores if for example you wanted to have data sets with different records of the physical data, such as natural isotope abundances.

To access chemical element information first obtain the current ChemElementStore via the root project, and then find the ChemElement within that:

chemElementStore = rootProject.currentChemElementStore

carbon = chemElementStore.findFirstChemElement(symbol='C')

print carbon.name, carbon.mass

Because this is all reference information, it is distributed together with the code and is always available. Note that we did not have to specifically load any information into the root project. Given a chemical element object an isotope may found within it, for example by specifying:

c13 = carbon.findFirstIsotope(massNumber=13)

print c13.gyroMagneticRatio

 

Chemical Compounds

The chemical compound templates are found directly under the root project (i.e. they are children of it). To get and list all available compounds issue:

availChemComps = rootProject.chemComps

for chemComp in availChemComps:

 print chemComp.molType, chemComp.ccpCode

Note that this returns an immutable unordered collection of objects (a Python frozenset), i.e. the chemical compound codes and molecule types are all mixed up. To get a reliably ordered list instead do:

orderedChemComps = rootProject.sortedChemComps()

for chemComp in orderedChemComps:

 print chemComp.molType, chemComp.ccpCode

Note that the list is sorted by both molecule type and compound code. In this instance both molType and ccpCode together uniquely identify the compound: Compare "DNA", "A" and "RNA", "A". To get only protein compounds you could do:

aminoAcids = rootProject.findAllChemComps(molType='protein')

Note that the "findAll" call returns a Python set object. And to get only the amino acid arginine and list the atoms of its standard form do:

arg = rootProject.findFirstChemComp(molType='protein', ccpCode='Arg')

standardArg = arg.findFirstChemCompVar(isDefaultVar=True)

for chemAtom in standardArg.chemAtoms:

 print chemAtom.name

Above we specify two keys to get a unique compound; molecule type and code word. Note that the ccpCode attribute is case sensitive and that the available molType attributes are 'protein', 'DNA', 'RNA', 'carbohydrate' and 'other'. Once we have a compound we can then get a specific variant (ChemCompVar), in terms of protonation state and chain linking type. Above we just get the default arginine; the common non-terminal protein form. To see all the forms of arginine (in an ordered list) do:

for chemCompVar in arg.sortedChemCompVars():

 print chemCompVar.linking, chemCompVar.descriptor

Here for the linking attribute 'start', 'middle' and 'end' refer to positions within a polypeptide chain, while 'none' refers to free, unlinked arginine. The descriptor attribute states the protonation state of the arginine variants.

Sequence Templates: Molecules

An empty molecule, which will subsequently filled with a sequence, may be made directly from the root project:

molecule1 = rootProject.newMolecule(name='MyMol')

A molecular sequence may be specified for this molecule using a list of residue codes (ccpCodes). Note that to achieve this we import a utility module which knows how to link a residue sequence into a linear polymer; otherwise we would have to specify all of the links independently.

from ccp.util.Molecule import addMolResidues

seq = ['Gln','Trp','Glu','Arg','Thr','Tyr']

addMolResidues(molecule1, 'protein', seq)

Note that the residue codes in the sequence must match the available ChemComp compound templates. Now our molecule is populated with residue specifications (MolResidues) which we can loop through from the molecule:

for molRes in molecule1.sortedMolResidues():

  print molRes.seqCode, molRes.ccpCode, molRes.linking

Now the molecule has residues all of the attributes that derive from the presence of residues are filled-in. For example to get the molecule's mass do:

print molecule1.molecularMass

Assignable Atoms: Molecular Systems

Lastly we can use our molecule specification, with its contained residues, to build a molecular system that represents a complex containing two chains, chainA and chainB. These chains have the same amino acid sequence but contain distinct sets of atoms. It is to the atoms within these two chains that any NMR assignments will be made, i.e. so that you may assign the two chains separately.

molSystem = rootProject.newMolSystem(code='MS1')

chainA = molSystem.newChain(code='A', molecule=molecule1)

chainB = molSystem.newChain(code='B', molecule=molecule1)

With the molecular system constructed we can then query its chains, residues and atoms:

for chain in molSystem.sortedChains():

  chainCode = chain.code

  for residue in chain.sortedResidues():

    resName = '%d%s' % (residue.seqCode, residue.ccpCode)

    for atom in residue.atoms:

      print chainCode, resName, atom.name

The initial molecule description was just a (sequence) template for residue specifications (molResidues), but once we make a chain within a molecular system, the sequence is combined with the chemComp descritions to give a full description of all the atoms.