Molecular Description
Introduction
Within the CCPN data model there are several packages that deal with molecular information. Here we will focus on the following which are commonly encountered when dealing with NMR information.
ChemElement - This package stores chemical element, i.e. periodic table, information. Contained within the package are Isotope objects which are frequently used with NMR data, e.g. by linking to the observed NMR resonances. This package is reference data and is not normally modifiable by the user.
ChemComp - The templates for the covalent construction of individual chemical compounds, in terms of atoms and bonds etc. Each type of compound is stored in a separate XML file and often describes several variants of the compound, e.g. charged and uncharged Lysine. The most commonly used ChemComps contain templates for the amino acid and nucleic acid residues. This data is reference information and cannot normally be modified by the user.
Molecule - This package stores how specific versions of component chemical compounds are linked into larger molecules, for example to specify the sequence and protonation state of a protein. Molecule sequences need not be linear, and thus can be used to describe branching arrangements. Also, a molecule may refer to only one compound, like a glucose monomer. Molecule information is fuly editable by the user.
MolSystem - This package represents all of the distinct atoms that are available (e.g. for resonance assignment) within a given sample or molecular complex. The objects in this package are created by combining molecule sequences with chemical compound templates. For example to make a molecular system representing a protein homodimer you would use one molecular sequence and the amino acid (ChemComp) templates to make two chains, where each chain contains its own set of distinct atoms.
Chemical Elements
The descriptions of the chemical elements are stored as ChemElement objects under an umberella ChemElementStore object. There could in theory be multiple ChemElementStores if for example you wanted to have data sets with different records of the physical data, such as natural isotope abundances.
To access chemical element information first obtain the current ChemElementStore via the root project, and then find the ChemElement within that:
chemElementStore = rootProject.currentChemElementStore
carbon = chemElementStore.findFirstChemElement(symbol='C')
print carbon.name, carbon.mass
Because this is all reference information, it is distributed together with the code and is always available. Note that we did not have to specifically load any information into the root project. Given a chemical element object an isotope may found within it, for example by specifying:
c13 = carbon.findFirstIsotope(massNumber=13)
print c13.gyroMagneticRatio
Chemical Compounds
The chemical compound templates are found directly under the root project (i.e. they are children of it). To get and list all available compounds issue:
availChemComps = rootProject.chemComps
for chemComp in availChemComps:
print chemComp.molType, chemComp.ccpCode
Note that this returns an immutable unordered collection of objects (a Python frozenset), i.e. the chemical compound codes and molecule types are all mixed up. To get a reliably ordered list instead do:
orderedChemComps = rootProject.sortedChemComps()
for chemComp in orderedChemComps:
print chemComp.molType, chemComp.ccpCode
Note that the list is sorted by both molecule type and compound code. In this instance both molType and ccpCode together uniquely identify the compound: Compare "DNA", "A" and "RNA", "A". To get only protein compounds you could do:
aminoAcids = rootProject.findAllChemComps(molType='protein')
Note that the "findAll" call returns a Python set object. And to get only the amino acid arginine and list the atoms of its standard form do:
arg = rootProject.findFirstChemComp(molType='protein', ccpCode='Arg')
standardArg = arg.findFirstChemCompVar(isDefaultVar=True)
for chemAtom in standardArg.chemAtoms:
print chemAtom.name
Above we specify two keys to get a unique compound; molecule type and code word. Note that the ccpCode attribute is case sensitive and that the available molType attributes are 'protein', 'DNA', 'RNA', 'carbohydrate' and 'other'. Once we have a compound we can then get a specific variant (ChemCompVar), in terms of protonation state and chain linking type. Above we just get the default arginine; the common non-terminal protein form. To see all the forms of arginine (in an ordered list) do:
for chemCompVar in arg.sortedChemCompVars():
print chemCompVar.linking, chemCompVar.descriptor
Here for the linking attribute 'start', 'middle' and 'end' refer to positions within a polypeptide chain, while 'none' refers to free, unlinked arginine. The descriptor attribute states the protonation state of the arginine variants.
Sequence Templates: Molecules
An empty molecule, which will subsequently filled with a sequence, may be made directly from the root project:
molecule1 = rootProject.newMolecule(name='MyMol')
A molecular sequence may be specified for this molecule using a list of residue codes (ccpCodes). Note that to achieve this we import a utility module which knows how to link a residue sequence into a linear polymer; otherwise we would have to specify all of the links independently.
from ccp.util.Molecule import addMolResidues
seq = ['Gln','Trp','Glu','Arg','Thr','Tyr']
addMolResidues(molecule1, 'protein', seq)
Note that the residue codes in the sequence must match the available ChemComp compound templates. Now our molecule is populated with residue specifications (MolResidues) which we can loop through from the molecule:
for molRes in molecule1.sortedMolResidues():
print molRes.seqCode, molRes.ccpCode, molRes.linking
Now the molecule has residues all of the attributes that derive from the presence of residues are filled-in. For example to get the molecule's mass do:
print molecule1.molecularMass
Assignable Atoms: Molecular Systems
Lastly we can use our molecule specification, with its contained residues, to build a molecular system that represents a complex containing two chains, chainA and chainB. These chains have the same amino acid sequence but contain distinct sets of atoms. It is to the atoms within these two chains that any NMR assignments will be made, i.e. so that you may assign the two chains separately.
molSystem = rootProject.newMolSystem(code='MS1')
chainA = molSystem.newChain(code='A', molecule=molecule1)
chainB = molSystem.newChain(code='B', molecule=molecule1)
With the molecular system constructed we can then query its chains, residues and atoms:
for chain in molSystem.sortedChains():
chainCode = chain.code
for residue in chain.sortedResidues():
resName = '%d%s' % (residue.seqCode, residue.ccpCode)
for atom in residue.atoms:
print chainCode, resName, atom.name
The initial molecule description was just a (sequence) template for residue specifications (molResidues), but once we make a chain within a molecular system, the sequence is combined with the chemComp descritions to give a full description of all the atoms.