The Analysis Program
The Idea
CcpNmr Analysis (Analysis) is a graphics-based interactive NMR assignment and project management program. It has been inspired by NMR assignment applications like ANSIG and Sparky. The program contains all new computer code and while it continues some of the strong traditions of other assignment programs it also provides novel ways to approach the analysis of NMR data. Analysis is built on the CCPN data model and what you see on the screen continuously reflects the state of the underlying data. When data change the screen picture is updated to match. The program has its own Data Model package, which means it uses CCPN technology to create a program-specific part of the Python API and thus allows program information (contours, window positions etc.) to be recorded and stored as XML files. Analysis is not just an assignment program; it can be considered a platform for all the NMR data described by the CCPN Data Model. For example you can use it to measure relaxation rates, calculate distance constraints, co-ordinate structure generation, edit molecular information and use it as a starting point to develop new software and algorithms for NMR.
Development
Analysis was primarily developed by Wayne Boucher and Tim Stevens in the group of Prof. Ernest Laue at the University of Cambridge, UK. Most of the computer code for Analysis is written in Python. The Python library documentation is found here. Only speed-critical functions are performed by code written in the fast, compiled language C. Such C functions include the calculation of contours and mathematically intensive algorithms. The Python part of the program consists of a series of integrated graphical windows ("Popups") and an underlying layer of Python library functions. The graphical elements allow the user to enter information and to view the status of the data, while the library functions manipulate the CCPN Data Model objects to record the scientific information.
Macros
Analysis has some pre-defined Python scripts that can be ran as in-program commands with an associated shortcut key. Analysis macro scripts are functions written in the Python programming language. They are normal Python functions, but they must be able to operate with one mandatory input argument “argServer”. This input argument serves to link the Python function with the Analysis program, to give access to the CCPN project data and the Analysis graphical interface.
All macros available for execution (being run) are accessible via the main “Macros” table and new macros can be added to the current Analysis project simply by uploading a Python file with an appropriate Python function. Note that the script itself is not stored inside the project. By adding a macro the user is merely locating the required Python function on disk; the actual code remains at its original file location.
If a macro’s Python code is moved to a different location on disk the script will no longer be executable in Analysis. Under such circumstances however, the old macro (and hence recorded location) can be removed and a fresh one can be put in its place. If the contents of a macros code have changed on disk since it was last executed (in the current Analysis session) then the user may reload the macro so that it is using the newest version. This mechanism provides a convenient way of developing and debugging CCPN Python scripts without having to restart Analysis each time.
Some highlighted features of Analysis
Windows
An Analysis project may contain an almost limitless number of spectrum windows. The windows are are inherently N-dimensional with scrollbars for not only the screen dimensions but also for orthogonal planes, with the ability to select any plane thickness. A window can be divided into several strips for easy comparison of different regions of spectra (see Figure 1).
Many spectra are shown in the same window where their contours and peaks are readily toggled on or off. Navigation is achieved by using the mouse or keyboard and there are inbuilt navigation functions to easily find orthogonal planes and return-peak positions.
Most interaction with the graphics is done by mouse. In general left-mouse is used to select and pick, middle-mouse to zoom and drag, and right mouse to call up the options menu. Combinations with the Shift and Control keys are used to add more possibilities.
Many functions may be applied to crosspeaks directly from the window menu. For example peaks may be assigned, deleted, unaliased and shift matched. Several of these functions can be used on several peaks at once to improve user efficiency. For example, columns of NOE peaks derived from the same amide resonances may be assigned to this amide at the same time.
Figure 1. A three dimensional spectrum window with vertical strips.
Molecules
Polymer chains and small molecules are readily put into NMR projects. Sequences may be imported or entered and from this Analysis will build the molecules with all of their NMR assignable atoms.
Many molecules of different types can be included and may be connected together into chains. For example, a GIP-anchored glycoprotein may be constructed by joining protein, sugar and lipid components.
By using data provided by the MSD at the EBI, CcpNmr software has access to all the 6000+ small molecule templates that have appeared in PDB structures (see Figure 2).
Figure 2. A molecule selecting popup.
Tables
Virtually all of the information within a project is available to the user via a graphical interface and much of the commonly used information is presented in tabular form (see Figure 3). These tables are used to display peak lists, chemical shifts, constraints, coordinates, spectrum configuration and the like.
To allow the user to change information (peak position, contour color, experiment name to name only a few...) they often have editable columns. The rows of the table can be sorted on any of the column types, can be filtered according to a search expression and can be selected (often several at once) to apply specific functions.
The data in a table may be exported to a text file and can e.g. if numerical, be plotted in a graph.
Figure 3. Tables are frequently used in Analysis to display data, here a peak list.
Assignment
Assignment in Analysis is a two-step process proceeding via an intermediate Resonance object. This allows the user to represent anonymous but connected assignment states, and allows atomic assignment to be made to several peaks at once.
Most crosspeak assignments is made by the user choosing a resonance (which need not be assigned to specific atoms) from a curated/ranked list. The choice is made with a single click (and is readily reversible) from a list of possibilities that are close in chemical shift (see Figure 4). Analysis supports multiple, ambiguous assignment for peaks.
Structural information can also be used in assignment. Here hydrogen resonances may be ranked according to their distance in an intermediate structure.
Assignment of resonances to specific atoms is achieved either through scrollable lists or directly from the peak assignment popup. This needs to only be done once for each resonance as all peaks which correspond to the same resonance will automatically share the atom information. The atom for assignment is selected by clicking in the Atom Browser (see Figure 5).
Analysis does not support pseudoatoms, and all assignment is ultimately connected back to individual, stereospecific atoms. Where appropriate, a resonance may however be assigned in a stereospecific or non-stereospecific manner. For example, it is possible to say that two peaks represent two different hydrogen beta atoms in a residue, with different chemical shifts, but without necessarily specifying the stereochemical arrangement of the two atoms.
Figure 4. The popup displaying the resonance possibilities for the dimensions of a cross peak.
Figure 5. The atom browser where atoms and residues are chosen during assignment.
Restraints
Analysis can be used to generate many types of restraints for structure calculation (e.g. distance, dihedral and H-bond restraints). Analysis uses (fully configurable) lookup tables to obtain (see Figure 6) and analyse such restraints.
Restraints may be generated from assigned NOESY peaks, or may be created by performing shift matching on unassigned peaks. The potentially ambiguous restraints thus output may then be used by programs like ARIA or CYANA.
The violations that result from a structure generation cycle may be imported into Analysis, from where the user can readily follow a link from the violated restraint to the peaks which were used in the generation of the violated restraint.
Figure 6. The Analysis popup used to create distance restraints.
Reference information
Analysis has access to a library of reference information (see Figure 7). This includes chemical compound descriptions, chemical shift distributions, isotope information, idealised residue coordinates, isotopomer labeling patterns, NMR pulse sequence descriptions, etc.
Reference information is often used implicitly within Analysis, so that the user does not have to worry about how to get hold of such information. Some of the data is visualised where it can be helpful. For example, chemical shift distributions during assignment.
Figure 7. Chemical shift distributions from the RefDB database is a part of the CCPN reference information.