Quick Guide to the CCPN Data Model for Software Development

The data model itself is an abstract description of all the data that is commonly used with NMR (extending into sample management, protein production, data tracking and pipelines, ...). For example, the NMR part of the data model describes an Experiment object - this corresponds to an NMR spectrum. This Experiment is linked to ExpDim object(s) - these describe the different dimensions in the spectrum. This abstract description of the data model is represented and maintained graphically using the Unified Modelling Language (UML).

The boxes describe the Experiment and ExpDim objects. The information inside the boxes are attributes that give meaning to the object. For example, you can set the name for an Experiment. Objects are then linked to each other - this is shown by the line between Experiment and ExpDim. The diamond in the link means that ExpDim is a child of Experiment - a dimension in the spectrum cannot exist without having a spectrum first.

What are 'packages'?

The data model is split up in packages. Each of these packages describes a 'unit' of information that can be shared by other packages. For example, the description of a template molecule is done in the 'Molecule' package, the description of a molecular system with 'real' molecules is done in the 'MolSystem' package. The 'Nmr' package uses information from the 'MolSystem' package, which could be shared by an 'Xray' package if it was available. For this reason the data of each package is stored in separate locations.

What is the 'API'?

API stands for Application Programming Interface. With an API the objects described by the data model can be manipulated in computer memory. Basically this means that the data is organized in a way that is consistent with the 'data model'. The API therefore also handles consistency checking of the objects (e.g. an Nmr Experiment object has to be linked to at least one ExpDim (experiment dimension)). The API is currently available in Python (with XML) C (with XML), and Java (with XML or database storage).

To continue with the example above, the objects that are the API maintains in memory for a 3D spectrum are shown below (note that the values for the 'dim' attribute are filled in for the ExpDim objects to distinguish between them).

Which programs use this 'API'?

CcpNmr FormatConverter and CcpNmr Analysis are built entirely on top of the data model APIs. CcpNmr programs like ChemBuild and FormatExchange have their own internal data structures but were written to connect directly to the APIs as appropriate. Programs from third parties have their own data structures and file formats, but many have been integrated so that they can be launched from a CCPN project and the results can be read back. The most closely integrated programs include ARIA, CING, and the CcpNmr ECI deposition tool. Much of this work started in EU collaborations projects like EUNMR and Extend-NMR projects,  and CCPN, as a partner in the WeNMR project, is committed to extending this integration.

How do I get my data into (and out of) the 'data model'?

The CcpNmr FormatConverter and CcpNmr FormatExchange applications allows you to import existing derived data formats (not raw spectra) into the data model. Export functions are also available so they can be used as a format converter between existing formats.

What is the advantage of having data inside the 'data model'?