Prodecomp-Integration

This is the Prodecomp tutorial, delivered as a hands-on exercise at the Gothenburg Advanced Course on New Methods of Data Acquisition and Analysis in Biomolecular NMR 20-24 September 2010. In the interest of speed it is put here immediately even though it has not been adapted to the site style guidelines.

The tutorial should work with CcpNmr Analysis versions 2.1.5 and higher. Version 2.1.5 must be up to date with patches as of October 1st 2010.

The exact test data used are not yet publicly released, and can not be given here. The procedure will work in a similar manner for other data, with names and spectrum numbers changed. Please contact Martin Billeter <martin.billeter@chem.gu.se>, if you are interested in test data.

Introduction:

Projection sets are acquired together, and the individual projections are extricated during processing. A high dimensionality pulse sequence like a 5D HABCABCONNH is acquired as a series of 2D. In the indirect dimension the frequency varies as a linear combination of frequencies from the individual dimensions. Examples would be wCO + wC, wN, wCO -wC +wH. In theory (not used in Prodecomp) you could have wCO -2wN or wCO + 1.5wN. Prodecomp calculates a complete 5D spectrum for each HN peak in each projection set (a 'component'), determining the frequencies for each axis. Different projection sets can be combined. Shared frequencies like N and HN are assumed to be identical in all projection sets, and serve to group together peaks from different sets. In order to use the projection we need to know the frequencies and scaling constants for each projection.

Prodecomp processes all the projections together. This is how you get up to '15D' data, combining axes from different experiments. In practice Prodecomp processes one small interval along the acquisition dimension at a time. For each you need to input the limits of the interval, and the number of components to look for. The calculation time increases dramatically as you have more projections, more components and larger intervals. The program is organised so that intervals are determined automatically. It sets up one calculation with suitable number of components for each peak in the HSQC experiment; one or more if these may overlap where peaks have similar proton frequencies. This leads to some redundant calculation, but has  the advantage that you can look for one component in each interval, that must match the HN peak that gave rise to the interval.

We use summary description files as input for loading projection sets - this saves us from having to load each projection individually. The summary files are generated automatically from the scripts that unpack and process the experiments. They are called prodecomp.txt (here, at least) and live in the first directory of each projection set.

Most of the spectrum data are taken from the Bruker files directly. Only the dimension names, the scaling factors, and file locations are taken from prodecomp.txt. File names given in prodecomp.txt are adjusted intelligently when the whole  project has moved. If this fails you need to edit the prodecomp.txt file.

Our scripts identify which dimensions from different spectra go together by their names. Before you start you have to check the prodecomp.txt files and make sure that the axis names are different in each experiment, and that all axes with the same names (from different experiments!) really do belong together. For this test set, this is already done. Although it is not necessary for the procedure, it is recommended to look at the input files we need first. They are

/1110/prodecomp.txt, 2125/prodecomp.txt 3132/prodecomp.txt

Test data:

15N HSQC: Experiment 107/1

13C HSQC: Experiment 109/1 (folded). Not used.

Projection sets:

- NUCLEI= HN, N, CO, Cab, Hab     HCBCACONNH

- SPECS   1110 - 1149

- Projection desciption in file 1110/prodecomp.txt

- Aliphatic carbon pulses are off-resonance. Hence the centre frequency is not

  found correctly in the experiment setup

- NUCLEI= HN, N, Ca, Cb, Ha, Hb   HBHACANNNH

- SPECS   2125 - 2149

- Projection desciption in file 2125/prodecomp.txt

- HB/CB peaks and HA/CA peaks have opposite sign.

  Prodecomp treats negative peaks as an extra dimension, but the experiment setup has only one axis for CB/CA and one for HB/HA, so the experiment as first loaded has too few dimensions.

- NUCLEI= HN, N, Ctocsy, Htocsy   HCCCONH (HC-TOCSY-CCONNH)

- SPECS   3132 - 3144

- Projection desciption in file 3132/prodecomp.txt.

- COrrect according to experiment setup.

Higher numbers (not used) are 4D NOESY projection sets, with projection definitions in directories 4139, 5146, 6153, 7160

Procedure:

- Go to Hist_data_all

- Open Analysis

- Create empty project (Projects->New)

- Go to Experiments->Open Spectra, set format to Bruker.

  Set directory to Hist_Data_All (if you are not there already)

  Open 107/1/pdata/1/procs, the 15N  HSQC spectrum. (NB it is shown directly, below the directories). Change the Experiment name from Bruker_107 to something sensible, like Nhsqc, and click [Open Spectrum]

- Check the verification dialogs, but everything is OK.

  Since this is a standard Bruker type ulse sequence, the Experiment Type is determined automatically.

- Pick its peaks for future use, with Ctrl-Shift-Leftmouse, in the window that pops up. You will definitely need the peaks above 8.8ppm.

The projections are stored as individual spectra, each with location, spectral width, referencing, scaling constants, etc. Each projection set is stored as separate spectra under a single experiment. The loading script reads the overview file to identify the spectra, loads the first one and allows you to edit parameters. It then loads the rest of the projections with the same parameters.

- Run Macro->loadProjDefinitionFile from main Macro menu.

  - If the macro is not there, go into Macro->Organise Macros

    - click Add Macro

    - Select ccpnmr/analysis/macros/OpenProjectionSpectra.py

    - select function loadProjDefinitionFile

    - click Load Macro

    - Find loadProjDefinitionFile at the bottom in the Macros tab, and set

    'In main menu?' to Yes

    - Now run Macros->loadProjDefinitionFile

- First select Hist_data_all/1110/prodecomp.txt

- Set Experiment name to HCBCACONNH

- You now get the 'Verify Referencing' popup for the first projection.

  Make sure that 'Use reduced dimensionality options' is On.

  Check if the values make sense and correct them.

  Most of the data are taken from the Bruker files directly. Only the dimension names, the scaling factors, and file locations are taken from prodecomp.txt.

  File names given in prodecomp.txt are adjusted inteligently when the whole project has moved. If this fails you need to edit the prodecomp.txt file.

  - In this experiment the aliphatic carbon pulses are off-resonance. Hence the center frequency is not found correctly in the experiment setup.

    To fix it change the Cab reference ppm to 41.318. 

  - When you are ready, click [Commit]. The macro will automatically load the rest of the spectra, set the experiment parameters and referencing identical to the first spectrum, and read the scaling factors. For std. Bruker experiments it will also set the experiment type.

- Your projections should now show up in the same window as the Nhsqc. Display will only be 'on' for the first one.

- For security, check the Experiment type in Experiment:Experiments:{Experiment Types} and check the parameters for some individual spectra in Experiment:Spectra: Referencing. Set 'Use reduced Dimensionality' to On.

- In the same way run Macros->loadProjDefinitionFile also for 3132/prodecomp.txt , setting the experiment name to e.g. HC_cCONH.TOCSY. This is the HC-TOCSY-CCONH experiment.

- Finally run Macros->loadProjDefinitionFile also for 2125/prodecomp.txt. Set the experiment name to e.g. HNCACBHAHB. In this case the 'Verify Referencing' tab has too few dimensions. The processed spectrum has two dimensions more than the Bruker data files (see above for why), so we need to add those. The last dimension, comes up as 'Cb' but is actually the proton dimension of the experiment.

  - The easiest way to fix things is to select the bottom SubDim  (CB - 1H) and click 'Add SubDimension Copy' twice.

  - After that change the 'Cb' dimension so that it is identical to the Ca dimension; changing Isotope, Sw (Hz), and Reference ppm in that order.

  - Next rename the two bottom SubDimensions Ha and Hb.

  - Finally set the Scaling Factors so that they match the first projection in the prodecomp.txt. When it is all done click 'Commit'. and check the values.

  There is not currently an Experiment Type that fits this projection, which with the two extra dimensions does not correspond to any commonly used pulse sequence, so leave the Experiment type unset.

Now we have the projection data loaded and can get to Prodecomp proper.

- First  open the Prodecomp popup (Other->Prodecomp).

- The first tab shows all available projections, with the scaling constant for each frequency. Check that the data make sense, and select the projections you need. To speed up calculations, select the HC_cCONH.TOCSY projections (use shift-click to select several and click [Inactivate Selected]).

- Now go to the {Exp Parameters} tab, and check if the data make sense. If any fields are red your projection descriptions are inconsistent and must be fixed.

Leave the number of Iterations and the Regularisation factor unchanged - they are control input for Prodecomp.

- Now go to the {Decomposition} tab. This allows you to generate and modify analysis intervals. Prodecomp analyses your projections one small slice of HN frequency at a time, to speed up processing. You may need to fiddle the exact interval position and number of expected components to get the best result for each interval. To start selecting intervals, check that the 'Guide peak list' points to the peaks picked for the 15H-HSQC, and click [GenerateIntervals]. This makes a calculation interval for every peak in the peak list. Note that if the peaks overlap so will the intervals.

- For speed we will look only at the peaks at the highest frequency. Go to the bottom of the list and select one of the intervals that start and end below point 100. This corresponds to peaks at 8.8ppm and above. Atart with interval 87, the one with a single component. Now click [Run Selected], and wait till a popup announces that the calculation is finished. The time depends on the number of projections, axes, and components in the calculation. You can follow progress on the command line.

- When the calculation is finished go to the {Output and results} tab and look at the decompositions. If you are unsatisfied with a given decomposition, you can go back, adjust interval widths, component numbers, or parameters, and rerun. When  you are happy, output the result to an XML file. The name of the output file is fixed, so move it to another location with a suitable name immediately, before you overwrite it by mistake.

Try a few intervals and identify the correct component and the various frequencies (HN, N, CO, CAi, CBi, CAi-1, CBi-1, HAi, HAi-1, HBi and HBi-1 (there are two betas)). The components are displayed in the Output & Results tab, also reachable by selecting a component in the Decomposition tab and pressing 'Plot Graphs'. The 'Plot 1D spectra' gives an alternative view, with an alignment cursor, but it is a demonstration prototype and likely to break - the shapes have no annotation but are shown in the same order as on the Exp Parameters tab.

The correct component should have a HN and N frequency that fits a peak in the interval. There should be the right number of peaks for each type of nucleus, and the frequencies should be reasonable. If none of the components look good, you can try changing the interval limits or the number of components to predict and re-running.