The CCPN Data Model
About
The CCPN Data model (Data Model) was initially set up based on some of the main objectives of CCPN, which are:
To establish a universal Data Model for harvesting and exchange of data between different NMR related software packages, and develop associated subroutine libraries.
To develop software packages using the common Data Model to perform standard functions required during the determination of macromolecular structures by NMR.
To promote the use of the common Data Model and to diffuse NMR related software made by third parties, especially those that make use of the Data Model, within the NMR community
From this is the Data Model intended to cover all data needed for macromolecular NMR spectroscopy from the initial experimental data to the final validation, covering areas such as molecular description, structure, and reference information that supports the NMR data. It is intended to be used as a standard for data exchange between different programs and databases. It is also specifically designed to handle intermediate information and results obtained during the analysis of NMR data.
The Data Model itself is an abstract description of all the data that is commonly used in NMR (with neighbouring areas like protein production being included). For example:
The NMR part of the Data Model describes an Experiment object - this corresponds to recording an NMR spectrum. This Experiment is linked to ExpDim object(s) which describe(s) the different experimental dimensions that generate the spectrum.
To mention another example:
The data collection from an Experiment is stored in a DatSource object. The Experiment stores “how things have been done” meanwhile as the DatSource stores “what have been done”. A DatSource corresponds by this to the record of an NMR spectrum.
This abstract description of the Data Model is represented and maintained graphically using the Unified Modelling Language (UML).
Application Programming Interfaces
A framework was developed around the Data Model that uses Application Programming Interfaces (APIs) in different programming languages. These APIs provide an in-memory representation of the data as organized by the Data Model, which means that data can be manipulated in the same way regardless of both computer platform and programming language. The data is organized in a way that is consistent with the 'data model'. The API also handles consistency checking of the objects (e.g. an Nmr Experiment object has to be linked to at least one ExpDim (experiment dimension)). The API is currently available in Python,Java, and C. Because the APIs maintain the integrity of the data according to the underlying data model, they allow a highly modular development of software libraries. In essence this means that new software for an API (in a particular language) can in principle use all the other software that already exist for that API, and that data can be exchanged between applications in different languages via supported storage formats (currently XML and SQL). The APIs handle all of the storage (loading and saving) of data described by the data model, ensuring consistency and freeing the programmer from this task.
Packages
The data model is split up in packages, each of which relates to a different conceptual (often scientific) grouping within the larger model. A package describes a 'unit' of information that can be shared by other packages. For example, the description of a template molecule is done in the 'Molecule' package, the description of a molecular system with 'real' molecules is done in the 'MolSystem' package. The 'Nmr' package uses information from the 'MolSystem' package, which could be shared by an 'Xray' package if it was available. For this reason the data of each package is stored in separate locations.
Figure 1. Overview of the CCPN Data Model with possible interactions between some of its packages.
Advantages of having data inside the Data Model
All programs that work with the Data Model 'understand' each other. For example, you can read data into the Data Model with the CcpNmr FormatConverter, start using CcpNmr Analysis straight away (providing it understands the spectrum raw data format), and transfer the information to ARIA for a structure calculation.
Scripts that work on the Data Model can be used by every application that uses a Data Model API. For example, if a good automatic assignment script was written it could be run from any Data Model based application.
Import/export to foreign formats. This basically allows you to store all your data in one place throughout a project, while going back and forth between different programs while doing that. Final export to an NmrStar file, ready for database deposition, is also included.
Programs that use the Data Model
Currently the CcpNmr applications: FormatConverter, ECI and Analysis, and the dihedral angle prediction program DANGLE interact directly with data inside the data model. The ISD, CING and ARIA software (versions 2.1 onwards) are fully integrated with the data model by using in-memory conversion. The QUEEN validation software from the CMBI works with CCPN via the FormatConverter. Also, several web services are capable of working with CCPN projects including HADDOCK, CING and CCPN Grid (running ARIA & ISD). The focus is now on making as many applications as possible compatible with the data model - already two existing European projects (Extend-NMR and EU-NMR) are committed to making the developed software work directly with CCPN.
CcpNmr Suite
The CcpNmr software suite is a series of programs for macromolecular NMR spectroscopy integrated with the CCPN Data Model. Between programs written by the CCPN, external contributions, and ‘outside’ programs integrated with the data model, it is the intention to provide one suite of programs to carry out all tasks needed in macromolecular NMR spectroscopy. The nature of the Data Model guarantees that any other program or suite that interfaces with the data model can be used alongside or instead of the CcpNmr suite.
The CcpNmr suite currently consists of:
Analysis, for interactive analysis, spectrum display, and assignment. CcpNmr Analysis has its own Data Model package, which means it uses CCPN technology to create a program-specific part of the Python API and thus allows program information (colours, window positions etc.) to be recorded and stored as XML files.
FormatConverter, for reading and writing between the data formats of existing programs.
Entry Completion Interface(ECI) and PDBe deposition system, for submitting CCPN data to PDB and BMRB.