Format Converter

Quick tutorial on how to use the Data Model

How to convert?

This tutorial describes how to import an NMRVIEW sequence and peak list file into the Data Model framework, how to create a chemical shift list from the peak list data, and finally how to export the data to XEASY format files. The example NMRVIEW files used in this tutorial can be downloaded as a tar file.

Creating a project

The first thing to do when starting with a new set of data is to create a Project in the Data Model. A Project groups data together, and you need to have such a Project available in the Data Model before you can do anything else.

After starting the FormatConverter, either create a new project from 'Project->New' (and type in a name for your project, e.g. 'test'), or load an existing one from a CCPN XML file via 'Project->Load'.

The data to be imported

The NMRVIEW data files describe a simple hypothetical peptide homodimer. The 'nmrView.seq' file contains the sequence of the molecule, the 'nmrView.xpk' file a peak list for a 15N HSQC NOESY. In the peak list files, the residues for the first chain in the homodimer are numbered from 1-12, for the second chain from 101-112.

Importing a sequence

The molecular information in the Data Model is highly organized, and consists of a layer of reference chemical compounds (the ChemComp), a layer that describes the molecules used in this project (the Molecule), and a final layer that describes the actual situation in the sample(s) that are being used (the Molecular System). For example, a homodimer would consist of only one Molecule that describes the sequence (and links to the correct reference chemical compounds), and one Molecular System, with two Chains, each of which is linked to the Molecule. Each chain then has Residues and Atoms.

Now go to 'Import->Single files->Sequence->NmrView'. The window that pops up allows you to specify the file location and additional settings (in this type of window, you can click on the 'i' button to get short information on what a specific setting means). First, click on 'Select file'. A file browser window will pop up: select the 'nmrView.seq' file, and press 'Select' at the bottom of this window. The file name should appear where 'Select file' was displayed. To see the additional (non-obligatory) options, press on the blue arrow next to 'Additional options'. Do not change the default settings for now.

To import the file, press the 'IMPORT' button. The (simple) sequence in the 'nmrView.seq' file will now be parsed and converted to the Data Model framework. Note that a text output window should appear that displays the output from the conversion scripts.

The window that appears next allows you to edit the information from the sequence file that was just read in. On the left is the information from the original external file, on the right the information that is Data Model specific. First, click on the 'GLU-ASP-VAL-...-GLY-GLY-LEU' button. In the window that appears now you can change the protonation state of residues and modify the sequence (e.g. split it up into separate molecules). This is not the aim of this tutorial, but click the 'Help' button for more information on how to do this. Click the 'Cancel' button and go back to the previous window. The molecule name can be reset by clicking on the button(s) below 'Molecule name'. Leave the name as is for now. Finally, the number of chains that have to be created for this molecule can be set. Since this is a homodimer, enter '2' in the 'Number of chains' box, and finally press 'OK' to continue.

The next window asks for a name for the molecular system you are about to create. Leave the name as is and press 'OK'. Then, you will be prompted to give chain codes for the two chains that are created. Press 'OK' for both. A window stating 'Successfully imported file: ...' should now appear. Press 'OK' - you have now successfully created the molecular information inside the Data Model!

Saving a project

It is always safest to save a project after a successful import. Go to 'Project->Save'. The name of your project is used as the default name for the directory with all the data; the directory will be written in your current working directory. If you want to modify this, or save another copy somewhere else, use 'Project->Save As'

Importing a peak list

Go to 'Import->Single files->Peaks->NmrView'. Select the 'nmrView.xpk' file in the same way as described before. You can here also select the 'Assignment separator' that is used in the NMRVIEW peak list file - it is currently a space. Leave all settings as they are and press 'IMPORT'.

In the Data Model, you have to describe the NMR Experiment from which the peak list is derived before you can create the peak list itself. From 'Pick experiment types', select the one with 'H_H[N].through-space' in the Experiment column (one of the two with 15N HSQC-NOESY' in the 'Common pathway name' column). Press 'Create', leave the experiment name as is, and press 'OK'. You will now be prompted for a name for the DataSource. A DataSource is an 'implementation' of the NMR experiment: for example you need a DataSource for the raw original NMR data, and then a separate DataSource for each differently processed spectrum (e.g. the full 3D version, 2D projections, ...). Just press 'OK' to continue.

The next prompt asks for a name for the peak list. Again press 'OK'. The window that appears now is very important: the order of the spectrum dimensions in the Data Model and inside the external peak file might not be the same, and here you can specify what each dimension means. On the left are the peak dimensions (with chemical shift range) from the external file, on the right the experiment dimensions for this particular Experiment in the Data Model. The 2nd and 3rd dimension have to be switched in this case: for 'Peak dim' number 1 (in the middle), change 'DataDimRef selection' to 'Dim 2, nucl 15N, ...', and for 'Peak dim' number 2 (the last one), change 'DataDimRef selection' to 'Dim 3, nucl 1H, ...'. The mapping is now set correctly, so press 'OK' to continue.The next window allows you to view and edit the nucleus and resonance frequency ('Reference data') and the number of points, SW, and referencing ('nmrView processing'). Leave everything as it is, and click 'Exit/continue'. A window stating 'Successfully imported file: ...' should now appear. Press 'OK' - you have now successfully created an NMR experiment and peak list inside the Data Model!

LinkResonances: Defining what the external atom names mean...

At this stage you have created on the one hand the molecular system with all the chains, residues and atoms, and on the other hand a peak list . However, this information is currently not linked to each other. This is possible because the NMR information is not linked directly to Atoms, but is instead connected to what we call Resonances. These Resonances do not have the traditional NMR meaning, but instead link all information that arises from one atom or a group of atoms together (click here for a detailed description). For example, a Resonance exists that connects all the information from what is called '4.HN' in the external NMRVIEW file. We now have to connect this Resonance to the HN Atom in residue 4, chain A.

The script that does this for you is called linkResonances. The next popup asks you if you want to run this now. Press 'Yes' (LinkResonances can also be run from Process->Run linkResonances).  Next you are asked if you want to link with default settings. Again, click click 'Yes'.

Now, you will have to specify what the sequence numbering in the external file means in relation to the molecular system inside the Data Model. This window will only pop up if it is not obvious how the information from the external file connects to the information in the Data Model. In this case, the sequence codes 1-12 from the external NMRVIEW file connect to Data Model residues 1-12 for chain A, while external sequence codes 101-112 connect to residues 1-12 for chain B. On the left, in blue, is the molecular system information inside the Data Model, on the right, in black, the information from the external file.

For 'Ccp chain code' row 'A (12 res...)', click on the 'Do not link' selector, and select 'Link to code ' ' (range '2 '-'5 ')'. The numbering here ranges from 2 to 5 because no information was present for 1 and 6-12 in the provided peak list. Under 'Sequence Id (code) start', the '2 (2)' entry should be automatically selected. You have now specified that sequence codes 2-5 in the external file correspond to residues 2-5 for chain A in the Data Model. Do the same thing for 'Ccp chain code' row 'B (12 res...)', but now select 'Link to code ' ' (range '102 '-'105 ')'. You also have to select '2 (2)' on the left hand side in this case. Finally, press 'OK' to accept this mapping.

The window 'LinkResonances ran successfully' should now appear. The Resonances are now linked to Atoms, and their assignment is unambiguously described. Now save the project.

Exporting a sequence, chemical shift and peak list

You can only export assignments after you have successfully linked the Resonances to the Atoms. Since this is now done, go to 'Export->XEasy' to write out XEASY files based on the NMRVIEW information that we imported. In real life you would be wise to check your import results first, though (see below)

The window that pops up only lists information that is currently available in the Data Model. In this case, sequence, chemical shift, peak and peak assignment files can be written out.

First write out the sequence file: select both the 'A' and 'B' chain from the 'Select chains to export' selection, click on the 'Select export file' button, type a file name and press 'Select' in the file selection window (or change the output directory if you want to). Click 'export sequence' to write out the XEASY sequence file. A window will now pop up where you can set the mapping to the external file. Since XEASY does not handle multiple chain codes, enter '101' in the box for 'Ccp Chain Code' entry 'B', so that residues for chain B will be written out with sequence codes 101-112. This mapping will also be used for the chemical shift and peak lists! Press 'OK' after the file is exported.

Now write out a chemical shift file: for XEASY this has to be done before writing the peak list, otherwise assignments cannot be written out. Leave the 'Select shift list to export' selection as is (there is only one chemical shift list), click on 'Select export file' button and select a file name as before. Click 'export shifts' to write out the XEASY chemical shift file, and press 'OK' when the file is exported.

Finally, export the peak file. Leave the 'Select peak list to export' selection as is (there is only one peak list), click on 'Select export file' button and select the export filename as before. Click 'export peaks' to write out the XEASY peak list file. You will (similar to the NMRVIEW peak list import) get a window to map the peak dimensions for the external peak list file to the experiment dimensions in the Data Model. Change this at will, or leave as is, and press 'OK'. Finally, press 'OK' again when the file is exported.

Note that the XEASY peak list format does not support ambiguous assignments: you have to write out a 'peak assignments' file to handle this.

Checking and problem fixing

Format conversion is not an exact science, mainly because so many files do not follow any format or naming system exactly. It is always wise to check for potential problems. You can have a look at the text output window to see how the link between the Resonances (on the left) and the atom(s) (on the right) was made, but that is only practical for small data sets like this one. Better to open the project in CcpNmr Analysis and check. In this case, make sure you go to Peak->Draw Parameters and set 'Chain Assignment' to ON in the Annotation Style tab. You need to distinguish between chain A and chain B assignments in this project. Look in Assignment->Quality Reports,  Resonances and Peaks tabs to see if there are any errors (in red). Also cast a quick look at the assignments in Resonance->Resonances and Peak->Peak Lists, the Peak Table tab, and open the nmrView.xpk file in an editor to compare.

In this case you will notice several red fields in Assignment->Quality Reports.  In nmrView.xpk you will see that the second peak has a volume of zero, and that the last peak is assigned to residue 4 HN and residue 5 N. Clearly these are errors in the peak file, which CcpNmr Analysis has made it easy for you to find. You can either edit the peak file and try again, or fix the results inside Analysis.

Looking in Peak->Peak Lists, you will see that peak 4 is assigned to a nameless resonance in F3; the original assignment was to 'HXX', and FormatConverter had no way of knowing that this should have been HA. the last peak is assigned to He* for both possibilities; the original had one as HE# and one as HE1, but FormatConverter correctly assigned both to He*. Finally, peak 1 is now assigned to Hb*, where the original had HB2. This is unlikely to be what you want.

The detailed version of LinkResonances allows you to deal with this kind of problem. To try it out, start following the tutorial again from the beginning. When asked if you want to run LinkResonances, say NO, and save the project. After that run LnkResonances again (Process->Run linkResonances), but say NO when asked if you want to run with default settings. You will now be given the opportunity to reset the various control parameters.

The first popup has parameters about atom names and prochiral handling. We could change the handling of the HB2 assignment, so it mapped to Hba instead of Hb*, by setting 'Status other atom for all single prochiral atoms' to 'Always Ignore'. For a big project we would do that, here we leave things as they are and wait to be asked. Click 'Link resonance to atoms' to continue.

The next popup allows you to select the MolSystem to link to. Leave ti as is (we only have one anyway). Next comes the popup that handles the chain mapping, that we have already dealt with above. An extra popup will ask you if you want to reset the sequence codes for chain B (to 101...). Say NO. Next you are asked what naming system to choose; keep the default.

The 'Select resonance atom match' popup now asks you what name to use for the Val 3 HXX assignment. Click 'Show all atoms', and then Pick atom'' HA, and press OK.

The 'Prochiral status selection' popup asks you how to treat the As-2 HB3 atom. the default is to assign HB2 as well, treating it as a HB* assignment. Choose instead teh 'HB2 is not assigned' option.

The 'Give status possible equivalent resonance' popup asks you if HE1 si to be treated as equivalent (equal to He*). In this case keep the program selection

Finally say OK when you are told that linkResonances ran successfully, and save the project again. You can check in Analysis and see the problems with D2 Hb2 and V3 HA have been fixed.

Final notes

This quick tutorial for the FormatConverter hopefully gave you an idea of how to handle import/export of external files. If you have any comments on this tutorial or would like to see other steps explained, please let us know!