NMR Experiments and Spectra
Introduction
NMR data is naturally at the heart of the CCPN data model. There are many NMR-related classes of object and this section introduces the first few, to do with overall organisation and description of NMR experiments. Please note that, although in the below examples we manually create many of the NMR objects to teach you what they mean and how they fit together, in many instances you can rely upon CCPN software to setup most of the objects for you; i.e. when you load a spectrum.
NmrProject - All NMR data is packaged under a single class of object called NmrProject. A main CCPN project (MemopsRoot) can contain several NmrProjects. The data in each one is self-contained, so there can be no links from one NmrProject to another. An NmrProject is really just a container class, with no additional information (except a name). In general there is no particular reason (at least when using file, rather than database, storage) to have more than one NmrProject per CCPN project (MemopsRoot).
Experiment - This object describes an occurence of an NMR experiment. This is a single collection of data in an NMR spectrometer at given time, under a given set of conditions. An Experiment is not the data that was produced by the experiment, but rather a record of what was done. Each NmrProject can have several Experiments.
ExpDim - In general an NMR Experiment has several dimensions. For example a 2-dimensional nitrogen HSQC will have a 15N and a 1H dimension. It is the ExpDim object that contains the information for a given dimension.
ExpDimRef - This class describes the referencing for values that can appear on an axis in an NMR spectrum. There may be several ExpDimRefs for each experimental dimension (ExpDim), either because the experiment is the sum of several experiments with differently referenced axes (e.g. 15N/13C HSQC), or because the actual peak position is a linear combination of different contributions with different referencings (J-coupling, reduced-dimensionality, or projection experiments).
DataSource - This class describes a set of stored data that was created by performing an NMR Experiment. It is a general class that may be used to describe data in the time domain (so "raw" or unprocessed data, as recorded by the spectrometer), frequency domain (so regular processed data), or axes that are neither time nor frequency (for example, for sampled data such as temperature series). Most of the time people think of DataSource as meaning Spectrum (so processed data). A single Experiment can have many DataSources. This can occur not only because there can be raw and processed data, but also, for example, because for efficiency reasons one might have different processed files for different parts of the spectrum.
AbstractDataDim - This contains information about the individual dimensions in the DataSource. Each AbstractDataDim object corresponds to a single ExpDim object. A given AbstractDataDim can be an FidDataDim, a FreqDataDim (the normal processed type for PPM/Hz axes) or a SampledDataDim.
DataDimRef - This class describes referencing information for a dimension of a DataSource (spectrum). A given dimension can have several alternative referencings because the Experiment might have a mixed axis, i.e. multiple ExpDimRefs, as you would find for example in the mixed 15N/13C HSQC-NOESY. In most cases however there is only one DataDimRef and these link to the peak dimemsions (PeakDims) for that axis.
The diagram below summarises these classes (and also peak classes).
NmrProject
To create an NmrProject from a main CCPN project (MemopsRoot) one just needs to specify a name.
nmrProject = rootProject.newNmrProject(name="myNmrProject")
This object contains all of the NMR data that can link together to form assignments etc. In essence it is just a container class, so there is nothing else much to think about with this object.
Experiment
The minimum amount of information that needs to be specified in order to create an NMR experiment is a name and the number of dimensions. The name is specified by the user, but of course the number of dimensions should be set according to what experiment was actually run.
experiment = nmrProject.newExperiment(name="myExpt", numDim=2)
Note that this experiment is the record of the experiment being done, and not the data that is produced; that comes later when we look at the dataSource (i.e. spectrum) object. When an Experiment is created the ExpDims (experimental dimension objects) for the Experiment are automatically created at the same time. Each ExpDim reference will contain (or link to) information that specifically relates to that one dimension.
print "number of expDims = %d" % len(experiment.expDims)
Although the basic experimental dimension objects are automatically created, the experiment dimension referencing objects (ExpDimRefs) are not automatically created. This is because additional information, specifically the spectrometer frequency (in MHz) and the isotope(s), need to be specified. The ExpDimRef objects contain much of the useful information as far as working with the experiment is concerned. It may seem complicated that the data model uses two separate kinds of dimension objects, rather than just one. However, having separate ExpDim and ExpDimRef objects allows you to have multiple referencing specifications (ExpDimRefs) on a single dimension; this is crucial for describing peak splitting (caused by coupling) and projection spectroscopy, where more than one scale needs to be considered before you can interpret what a peak position means.
For the moment we will keep things simple and have only one ExpDimRef for each dimension. Assuming, for example, that our experiment is an HSQC we can add the referencing information as follows:
expDims = experiment.sortedExpDims()
expDims[0].newExpDimRef(sf=800, isotopeCodes=("1H",))
expDims[1].newExpDimRef(sf=81, isotopeCodes=("15N",))
It may seem annoying to have to specify isotopeCodes as a list when the normal case is that there is only one isotope, but this is because the data model is general purpose; so has to accommodate the possibility that there is more than one isotope in the given dimension.
Next we set up a reference experiment (RefExperiment) for the experiment. This will be useful later because it will limit which atoms can occur in the assignment for a given dimension. Each RefExperiment occurs in an NmrExpPrototype; an object that gives details of the NMR magnetisation transfer pathway. In general an NmrExpPrototype may describe several RefExperiments but for an HSQC there is only one, so it is a bit easier to find. To find an link our experiment to the HSQC type information we could do:
nmrExpPrototype = rootProject.findFirstNmrExpPrototype(synonym="15N HSQC/HMQC")
refExperiment = nmrExpPrototype.findFirstRefExperiment()
experiment.refExperiment = refExperiment
However, making all of the expected links is actually more complicated than just doing the above. One also needs to map each dimension (ExpDim) of the Experiment to the corresponding reference experimental dimension (RefExpDim) of the RefExperiment, and each ExpDimRef to the corresponding RefExpDimRef. This kind of linking is important so we know which step in the magnetisation transfer pathway each experimental dimension refers to: for example in a 3D 15N-HSQC NOESY there are two 1H dimensions and we must know which corresponds to the nitrogen bound proton. Fortunately there is some utility code that makes this complicated job much easier.
from ccpnmr.analysis.core.ExperimentBasic import setRefExperiment
nmrExpPrototype = rootProject.findFirstNmrExpPrototype(synonym="15N HSQC/HMQC")
refExperiment = nmrExpPrototype.findFirstRefExperiment()
setRefExperiment(experiment, refExperiment)
print refExperiment.name
Spectrum (DataSource)
If an Experiment contains a record of what was done a DataSource object contains a record of the data the was produced. We would typically refer to a DataSource as a "spectrum", but it is a general object that could represent an FID or part-processed spectrum. Creating a DataSource is complicated because one also needs to correctly create objects for the data dimensions, i.e. AbstractDataDim objects. Note that this class of object is 'abstract' because there are several sub-classes of data dimensionA data dimension will typically be a frequency type of dimension (FreqDataDim) that specifies PPM or Hz referencing, but is also possible for the dimension to be sampled (e.g. when one axis relates to time pints for a relaxation experiment).
For each FreqDataDim a data dimension reference, the DataDimRef, maps these to the experimental dimensions and experimental referencing (ExpDims and the ExpDimRefs). Because there may be multiple experimental references on each dimension so too may there be several data references for the same dimension.
For the DataSource object itself one initially needs to specify a name, the number of dimensions, and the dataType ("processed", "FID", "part-processed"):
spectrum = experiment.newDataSource(name="mySpectrum", numDim=2, dataType="processed")
To create a FreqDataDim we need to specify all of the parameters that indicate how the data is stored. Specifically, you need to specify whether the data is complex, what the number of points are both in the full spectrum (so immediately after Fourier transformation) and after possible truncation, and the valuePerPoint for the referencing. The valuePerPoint is the spectral width divided by the number of points. Then to create a DataDimRef one needs the reference value (refValue) at a specific point (refPoint). So for an HSQC, using the above spectrum and experiment, one can do the following to define the referencing information:
numPoints = numPointsOrig = (2048, 1024)
sw = (8049.0, 1014.0)
valuePerPoint = (sw[0]/numPoints[0], sw[1]/numPoints[1])
refPoint = (1024.5, 512.5)
refValue = (4.72, 117.4)
And then go through each of the experimental dimensions to setup the referenced data dimensions, noting that the dimension objects are numbered with the dim attribute (i.e. the dim is 1 or 2), but the list indices i start from 0:
expDims = experiment.sortedExpDims()
for i, expDim in enumerate(expDims):
dim = expDim.dim
freqDataDim = spectrum.newFreqDataDim(dim=dim, expDim=expDim,
isComplex=False,
numPoints=numPoints[i],
numPointsOrig=numPoints[i],
valuePerPoint=valuePerPoint[i])
expDimRef = expDim.findFirstExpDimRef()
freqDataDim.newDataDimRef(expDimRef=expDimRef,
refPoint=refPoint[i],
refValue=refValue[i])
The information we have included so far just describes the DataSource (it is often called "metadata"). Next we look at the actual data which the DataSource is referring to. This is usually binary data, so the CCPN software library does not attempt to read it (although Analysis does). Instead there is a pointer to the data on disk, via a DataStore object. Setting up this DataStore object is non-trivial, and of course if you load spectra via CCPN software much of this will be automatically be done for you.
To learn how we sould do this manually, firstly there is the issue of the path (the location of the data). The path is split into two parts, the "head" and the "tail". Exactly how this is done is up to the programmer. The idea is that even if you have lots of DataStores one should create as few "heads" (technically known as DataUrls) as possible, so that if the data is moved to a new computer then as few paths as possible need changing. This is an art rather than a science.
One also has something called a DataLocationStore, which is simply a container object for the DataStores and the DataUrls. There could be more than one of these in an application, but there is often no reason to have more than one.
Spectral data is often "blocked" (to make the disk access more efficient). The particular kind of DataStore for this kind of data is called a BlockedBinaryMatrix. Here is an example of how one could create it, noting that the information we specify details exacly how the spectrum is stored:
from memops.api.Implementation import Url
dataLocationStore = rootProject.findFirstDataLocationStore()
if not dataLocationStore:
dataLocationStore = rootProject.newDataLocationStore(name='testStore')
dd = {}
dd['path'] = '115.spc'
dd['dataUrl'] = dataLocationStore.newDataUrl(url=Url(path='/usr/myaccount/mydata'))
dd['blockSizes'] = (256, 128)
dd['headerSize'] = 0
dd['isBigEndian'] = True
dd['isComplex'] = (False, False)
dd['nByte'] = 4
dd['numPoints'] = [dataDim.numPoints for dataDim in spectrum.sortedDataDims()]
dd['numberType'] = 'float'
dataStore = dataLocationStore.newBlockedBinaryMatrix(**dd)
spectrum.dataStore = dataStore
As can be seen, this is definitely not trivial. Normally the above 'manual' method is not how one would create DataSources or DataStores. Normally the information for a spectrum comes from parameters in the the data file (or associated file or files) that a processing package such as NMRPipe or Azara creates. There is utility code that makes loading this kind of data relatively easy. For example, for an Azara parameter file, "115.spc.par", which contains information about the actual binary data file (115.spc) and the referencing information, one could do as follows:
from ccp.format.spectra.params.AzaraParams import AzaraParams
params = AzaraParams("115.spc.par")
spectrum = params.createDataSource(experiment, name="mySpectrum")
This not only creates the DataSource, FreqDataDim and DataDimRef objects, it also sets up the associated DataStore. For example, after running the above, to find out what the actual data file is one can do:
print "data file = %s" % spectrum.dataStore.fullPath
Note that the spectrum.dataStore object could be None for spectra created with different methodology, as for example, the long way given initially above, so in general one has to be careful using it. Using the params.createDataSource method is exactly what Analysis does to load a spectrum. If, after loading a spectrum, you wanted to find out, say, the number of points in each dimension, then you could do:
for dataDim in spectrum.dataDims:
print "In dim %d, there are %d number of points" % (dataDim.dim, dataDim.numPoints)
From a DataSource (spectrum) object you can get hold of the associated experiment in one of two ways:
expt = spectrum.experiment
or
expt = spectrum.parent
Although the specification of the referencing uses valuePerPoint rather than spectralWidth, there is a "derived" function for the latter (so there is some code in the software which automatically calculates the spectralWidth from the valuePerPoint and numPoints):
for dataDim in spectrum.dataDims:
# below assumes that the dataDim is a FreqDataDim
# and that dataDim has a unique dataDimRef which has been set
dataDimRef = dataDim.findFirstDataDimRef()
print "In dim %d, the spectral width is %f" % (dataDim.dim, dataDimRef.spectralWidth)