The CCPN Data Model

About

The CCPN Data model (Data Model) was initially set up based on some of the main objectives of CCPN, which are:

From this is the Data Model intended to cover all data needed for macromolecular NMR spectroscopy from the initial experimental data to the final validation, covering areas such as molecular description, structure, and reference information that supports the NMR data.  It is intended to be used as a standard for data exchange between different programs and databases. It is also specifically designed to handle intermediate information and results obtained during the analysis of NMR data.

The Data Model itself is an abstract description of all the data that is commonly used in NMR (with neighbouring areas like protein production being included). For example:

To mention another example:

This abstract description of the Data Model is represented and maintained graphically using the Unified Modelling Language (UML).

Application Programming Interfaces

A framework was developed around the Data Model that uses Application Programming Interfaces (APIs) in different programming languages. These APIs provide an in-memory representation of the data as organized by the Data Model, which means that data can be manipulated in the same way regardless of both computer platform and programming language. The data is organized in a way that is consistent with the 'data model'. The API also handles consistency checking of the objects (e.g. an Nmr Experiment object has to be linked to at least one ExpDim (experiment dimension)). The API is currently available in Python,Java, and C. Because the APIs maintain the integrity of the data according to the underlying data model, they allow a highly modular development of software libraries. In essence this means that new software for an API (in a particular language) can in principle use all the other software that already exist for that API, and that data can be exchanged between applications in different languages via supported storage formats (currently XML and SQL). The APIs handle all of the storage (loading and saving) of data described by the data model, ensuring consistency and freeing the programmer from this task.

Packages

The data model is split up in packages, each of which relates to a different conceptual (often scientific) grouping within the larger model. A package describes a 'unit' of information that can be shared by other packages. For example, the description of a template molecule is done in the 'Molecule' package, the description of a molecular system with 'real' molecules is done in the 'MolSystem' package. The 'Nmr' package uses information from the 'MolSystem' package, which could be shared by an 'Xray' package if it was available. For this reason the data of each package is stored in separate locations.

Figure 1. Overview of the CCPN Data Model with possible interactions between some of its packages.

Advantages of having data inside the Data Model

Programs that use the Data Model

Currently the CcpNmr applications: FormatConverter, ECI and Analysis, and the dihedral angle prediction program DANGLE interact directly with data inside the data model. The ISD, CING and ARIA software (versions 2.1 onwards) are fully integrated with the data model by using in-memory conversion. The QUEEN validation software from the CMBI works with CCPN via the FormatConverter. Also, several web services are capable of working with CCPN projects including HADDOCK, CING and CCPN Grid (running ARIA & ISD). The focus is now on making as many applications as possible compatible with the data model - already two existing European projects (Extend-NMR and EU-NMR) are committed to making the developed software work directly with CCPN.

CcpNmr Suite

The CcpNmr software suite is a series of programs for macromolecular NMR spectroscopy integrated with the CCPN Data Model. Between programs written by the CCPN, external contributions, and ‘outside’ programs integrated with the data model, it is the intention to provide one suite of programs to carry out all tasks needed in macromolecular NMR spectroscopy. The nature of the Data Model guarantees that any other program or suite that interfaces with the data model can be used alongside or instead of the CcpNmr suite.

The CcpNmr suite currently consists of: