Database Structure

Since the database and access to the data within the database is an integral part of the JThermodynamicsCloud application, the structure of the database plays an integral part in the structure of JThermodynamicsCloud. The key to accomplishing the design goal of promoting traceability and accountability by have a record of data lineage and provenance is closely related to how persistent data is stored. One consequence of the design goals is having to store large amounts of data, thus one important prerequisite of the database is to be efficient in this respect. A design decision of JThermodynamicsCloud to satisfy the prerequisite was using a noSQL document-based database, namely the Firebase database of Google, whose efficiency scales well with increasing data storage.

Persistent data

Persistent data, i.e. data stored in the database, in JThermodynamicsCloud includes not only the fundamental data needed for the calculation of temperature dependent thermodynamics, but also preliminary data from which the fundamental data is derived and all previous versions of the fundamental data. One primary goal of JThermodynamicsCloud is to have a complete record of the evolution of the database. Preliminary data that is used to create fundamental data is stored. If data is changed or replaced, then both the previous and the current version is available. In short, all data past and present is preserved for traceability and accountability reasons. In addition, several versions of sets of data can exist simultaneously. For example, Each user can have an personal copy of the fundamental data to modify and adapt.

In short, this means that the database can grow considerably. For this reason JThermodynamicsCloud uses a noSQL document-based database, namely the Firebase database of Google, to store the persistent data. The fundamental property of a noSQL class of database is scalability, meaning that the database can grow significantly and the performance of handling large amounts of data is optimized. These properties allow JThermodynamicsCloud to scale up to the large amount of data that could ensue with the storage of all data promoting the traceability and accountability of data and personalized data sets.

Document-based noSQL Database

There are several types of noSQL databases (all having scalability properties). The Firestore noSQL database is document-based. Each document is a free-form set of property-value pairs. The 'documents' in JThermodynamicsCloud are the catalog objects (See Catalog Objects ). Documents are arranged in hierarchically nested collections. In JThermodynamicsCloud the path through the hierarchy through the hierarchy is designated by string labels. The string labels organize the documents. For example, fairly high in the hierarchy a set of nodes represent the owners of the catalog objects and all objects in the hierarchy underneath this node have been created by this user. Another type of node, for example, separates the different types of fundamental data by using the name of the fundamental data. Underneath this node are all the versions of this data.

An important concept in the noSQL document-based database is that of a collection. A collection is a set of related documents which in JThermodynamicsCloud are the catalog objects. Each collection of documents is found at the end of a heirarchy of nodes denoted by appropriate labels, as described above. For example, one collection of related catalog objects is the set of Benson rules that should be used to calculate the thermodynamics. In the search for specific catalog objects in a document-based database is done by first finding the collection of related catalog objects and then, within this related collection, search for documents with specific properties. The specific property would be denoted by the property label. The catalog objects having the desired value of this property would be returned in the search process.

For example, within a thermodynamic calculation the benson rules should be applied. The Benson rules that should be applied are found together in one collection. There could be several versions of Benson rules found in the database. The specific collection of Benson rules that should be used is specified by the dataset collection (see Dataset Collection ). Within the dataset collection to use is specified through a label. Within the desired dataset collection object one of the properties the position is the database hierarchy (the address) of the collection of Benson rules to be applied. One of the properties of the Benson rule is a unique name which specifies the center atom and connected atoms and this is listed under the BensonRuleDatabaseReference property. When the target molecule is analyzed, the list of needed Benson rules are found and given as a list of values of the BensonRuleDatabaseReference property. These are found in the collection specified by the dataset collection.

The search for catalog objects in JThermodynamicsCloud uses the strengths of the structure of the noSQL document-based database for efficient search. In addition, JThermodynamicsCloud uses the scalability property of the database to save large amounts of pertinent data related to each catalog object and promote the traceability and accountability of each data object. traceability and accountability is one of the properties that makes the database makes the data management of JThermodynamicsCloud unique and represents an important advance in the domain of chemical information. Though the principles shown here of database management were demonstrated on the calculation of thermodynamic properties, the range of application is general and could be applied elsewhere.

Google Sites

Report abuse