NMR Experiments and Spectra

Introduction

NMR data is naturally at the heart of the CCPN data model.  There are many NMR-related classes of object and this section introduces the first few, to do with overall organisation and description of NMR experiments. Please note that, although in the below examples we manually create many of the NMR objects to teach you what they mean and how they fit together, in many instances you can rely upon CCPN software to setup most of the objects for you; i.e. when you load a spectrum.

The diagram below summarises these classes (and also peak classes).

NmrProject

To create an NmrProject from a main CCPN project (MemopsRoot) one just needs to specify a name.

nmrProject = rootProject.newNmrProject(name="myNmrProject")

This object contains all of the NMR data that can link together to form assignments etc. In essence it is just a container class, so there is nothing else much to think about with this object.

Experiment

The minimum amount of information that needs to be specified in order to create an NMR experiment is a name and the number of dimensions.  The name is specified by the user, but of course the number of dimensions should be set according to what experiment was actually run.

experiment = nmrProject.newExperiment(name="myExpt", numDim=2)

Note that this experiment is the record of the experiment being done, and not the data that is produced; that comes later when we look at the dataSource (i.e. spectrum) object. When an Experiment is created the ExpDims (experimental dimension objects) for the Experiment are automatically created at the same time. Each ExpDim reference will contain (or link to) information that specifically relates to that one dimension.

print "number of expDims = %d" % len(experiment.expDims)

Although the basic experimental dimension objects are automatically created, the experiment dimension referencing objects (ExpDimRefs) are not automatically created. This is because additional information, specifically the spectrometer frequency (in MHz) and the isotope(s), need to be specified. The ExpDimRef objects contain much of the useful information as far as working with the experiment is concerned. It may seem complicated that the data model uses two separate kinds of dimension objects, rather than just one. However, having separate ExpDim and ExpDimRef objects allows you to have multiple referencing specifications (ExpDimRefs) on a single dimension; this is crucial for describing peak splitting (caused by coupling) and projection spectroscopy, where more than one scale needs to be considered before you can interpret what a peak position means.

For the moment we will keep things simple and have only one ExpDimRef for each dimension. Assuming, for example, that our experiment is an HSQC we can add the referencing information as follows:

expDims = experiment.sortedExpDims()

expDims[0].newExpDimRef(sf=800, isotopeCodes=("1H",))

expDims[1].newExpDimRef(sf=81, isotopeCodes=("15N",))

It may seem annoying to have to specify isotopeCodes as a list when the normal case is that there is only one isotope, but this is because the data model is general purpose; so has to accommodate the possibility that there is more than one isotope in the given dimension.

Next we set up a reference experiment (RefExperiment) for the experiment.  This will be useful later because it will limit which atoms can occur in the assignment for a given dimension.  Each RefExperiment occurs in an NmrExpPrototype; an object that gives details of the NMR magnetisation transfer pathway.  In general an NmrExpPrototype may describe several RefExperiments but for an HSQC there is only one, so it is a bit easier to find. To find an link our experiment to the HSQC type information we could do:

nmrExpPrototype = rootProject.findFirstNmrExpPrototype(synonym="15N HSQC/HMQC")

refExperiment = nmrExpPrototype.findFirstRefExperiment()

experiment.refExperiment = refExperiment

However, making all of the expected links is actually more complicated than just doing the above.  One also needs to map each dimension (ExpDim) of the Experiment to the corresponding reference experimental dimension (RefExpDim) of the RefExperiment, and each ExpDimRef to the corresponding RefExpDimRef.  This kind of linking is important so we know which step in the magnetisation transfer pathway each experimental dimension refers to: for example in a 3D 15N-HSQC NOESY there are two 1H dimensions and we must know which corresponds to the nitrogen bound proton. Fortunately there is some utility code that makes this complicated job much easier.

from ccpnmr.analysis.core.ExperimentBasic import setRefExperiment

nmrExpPrototype = rootProject.findFirstNmrExpPrototype(synonym="15N HSQC/HMQC")

refExperiment = nmrExpPrototype.findFirstRefExperiment()

setRefExperiment(experiment, refExperiment)

print refExperiment.name

Spectrum (DataSource)

If an Experiment contains a record of what was done a DataSource object contains a record of the data the was produced. We would typically refer to a DataSource as a "spectrum", but it is a general object that could represent an FID or part-processed spectrum. Creating a DataSource is complicated because one also needs to correctly create objects for the data dimensions, i.e. AbstractDataDim objects. Note that this class of object is 'abstract' because there are several sub-classes of data dimensionA data dimension will typically be a frequency type of dimension (FreqDataDim) that specifies PPM or Hz referencing, but is also possible for the dimension to be sampled (e.g. when one axis relates to time pints for a relaxation experiment).

For each FreqDataDim a data dimension reference, the DataDimRef, maps these to the experimental dimensions and experimental referencing (ExpDims and the ExpDimRefs). Because there may be multiple experimental references on each dimension so too may there be several data references for the same dimension.

For the DataSource object itself one initially needs to specify a name, the number of dimensions, and the dataType ("processed", "FID", "part-processed"):

spectrum = experiment.newDataSource(name="mySpectrum", numDim=2, dataType="processed")

To create a FreqDataDim we need to specify all of the parameters that indicate how the data is stored. Specifically, you need to specify whether the data is complex, what the number of points are both in the full spectrum (so immediately after Fourier transformation) and after possible truncation, and the valuePerPoint for the referencing. The valuePerPoint is the spectral width divided by the number of points. Then to create a DataDimRef one needs the reference value (refValue) at a specific point (refPoint). So for an HSQC, using the above spectrum and experiment,  one can do the following to define the referencing information:

numPoints = numPointsOrig = (2048, 1024)

sw = (8049.0, 1014.0)

valuePerPoint = (sw[0]/numPoints[0], sw[1]/numPoints[1])

refPoint = (1024.5, 512.5)

refValue = (4.72, 117.4)

And then go through each of the experimental dimensions to setup the referenced data dimensions, noting that the dimension objects are numbered with the dim attribute (i.e. the dim is 1 or 2), but the list indices i start from 0:

expDims = experiment.sortedExpDims()

for i, expDim in enumerate(expDims):

  dim = expDim.dim

  freqDataDim = spectrum.newFreqDataDim(dim=dim, expDim=expDim,

                                       isComplex=False,

                                       numPoints=numPoints[i],

                                       numPointsOrig=numPoints[i],

                                        valuePerPoint=valuePerPoint[i])

  expDimRef = expDim.findFirstExpDimRef()

  freqDataDim.newDataDimRef(expDimRef=expDimRef,

                           refPoint=refPoint[i],

                           refValue=refValue[i])

The information we have included so far just describes the DataSource (it is often called "metadata").  Next we look at the actual data which the DataSource is referring to.  This is usually binary data, so the CCPN software library does not attempt to read it (although Analysis does).  Instead there is a pointer to the data on disk, via a DataStore object. Setting up this DataStore object is non-trivial, and of course if you load spectra via CCPN software much of this will be automatically be done for you.

To learn how we sould do this manually, firstly there is the issue of the path (the location of the data).  The path is split into two parts, the "head" and the "tail".  Exactly how this is done is up to the programmer.  The idea is that even if you have lots of DataStores one should create as few "heads" (technically known as DataUrls) as possible, so that if the data is moved to a new computer then as few paths as possible need changing.  This is an art rather than a science.

One also has something called a DataLocationStore, which is simply a container object for the DataStores and the DataUrls.  There could be more than one of these in an application, but there is often no reason to have more than one.

Spectral data is often "blocked" (to make the disk access more efficient).  The particular kind of DataStore for this kind of data is called a BlockedBinaryMatrix.  Here is an example of how one could create it, noting that the information we specify details exacly how the spectrum is stored:

from memops.api.Implementation import Url

dataLocationStore = rootProject.findFirstDataLocationStore()

if not dataLocationStore:

  dataLocationStore = rootProject.newDataLocationStore(name='testStore')

dd = {}

dd['path'] = '115.spc'

dd['dataUrl'] = dataLocationStore.newDataUrl(url=Url(path='/usr/myaccount/mydata'))

dd['blockSizes'] = (256, 128)

dd['headerSize'] = 0

dd['isBigEndian'] = True

dd['isComplex'] = (False, False)

dd['nByte'] = 4

dd['numPoints'] = [dataDim.numPoints for dataDim in spectrum.sortedDataDims()]

dd['numberType'] = 'float'

dataStore = dataLocationStore.newBlockedBinaryMatrix(**dd)

spectrum.dataStore = dataStore

As can be seen, this is definitely not trivial.  Normally the above 'manual' method is not how one would create DataSources or DataStores.  Normally the information for a spectrum comes from parameters in the the data file (or associated file or files) that a processing package such as NMRPipe or Azara creates.  There is utility code that makes loading this kind of data relatively easy.  For example, for an Azara parameter file, "115.spc.par", which contains information about the actual binary data file (115.spc) and the referencing information, one could do as follows:

from ccp.format.spectra.params.AzaraParams import AzaraParams

params = AzaraParams("115.spc.par")

spectrum = params.createDataSource(experiment, name="mySpectrum")

This not only creates the DataSource, FreqDataDim and DataDimRef objects, it also sets up the associated DataStore.  For example, after running the above, to find out what the actual data file is one can do:

print "data file = %s" % spectrum.dataStore.fullPath

Note that the spectrum.dataStore object could be None for spectra created with different methodology, as for example, the long way given initially above, so in general one has to be careful using it. Using the params.createDataSource method is exactly what Analysis does to load a spectrum. If, after loading a spectrum, you wanted to find out, say, the number of points in each dimension, then you could do:

for dataDim in spectrum.dataDims:

 print "In dim %d, there are %d number of points" % (dataDim.dim, dataDim.numPoints)

From a DataSource (spectrum) object you can get hold of the associated experiment in one of two ways:

expt = spectrum.experiment

or

expt = spectrum.parent

Although the specification of the referencing uses valuePerPoint rather than spectralWidth, there is a "derived" function for the latter (so there is some code in the software which automatically calculates the spectralWidth from the valuePerPoint and numPoints):

for dataDim in spectrum.dataDims:

  # below assumes that the dataDim is a FreqDataDim

  # and that dataDim has a unique dataDimRef which has been set

  dataDimRef = dataDim.findFirstDataDimRef()

  print "In dim %d, the spectral width is %f" % (dataDim.dim, dataDimRef.spectralWidth)