Dataset Collection

Motivation

Data is not static. Due to updates in technology updates of data are continually occurring. A new set of updates can mean a new version of data. But do the updates reflect an improvement of results? This can only be answered if both versions are available and both versions can be separately tested. One of the governing principles of JThermodynamicsCloud is that there are no restrictions on storing data (see Section Database Structure) and that several versions of data can exists simultaneously. This means that calculations can be made with two separate versions of the database. Specifying which versions should be used is accomplished with the data object dataset collections.

Configurable

Within JThermodynamicCloud, each fundamental data type (see Section Fundamental Data for details) , like Benson rules, symmetry definitions, steric energy definitions, ring strain energies, meta-atoms, etc., are collected in sets. Each version of these sets is found it its own place in the database hierarchy and is specified by the collection address. Each individual collection of fundamental data has its own collection address, i.e. each type resides in a different place in the database hierarchy. To perform a thermodynamic calculation the implementation needs to know which of fundamental dataset, for each different type, is current. The Collection dataset object keeps track of this information. Within the Collection Dataset object is the list of current fundamental data sets to use. To perform a calculation, the Collection Dataset object is specified and the calculation proceeds with that set of data. In this way, which data is used for the calculation is more controlled. Several Collection Datasets, each with its own label, can be configured for comparisons.

DatasetSpecificationForCollectionSet

The dataset:DatasetSpecificationForCollectionSet compound object is the key to fundamental data versioning and for the flexibility deciding how the fundamental data is to be used for the thermodynamic calculations. In the calculation of thermodynamic data, each fundamental data type has its own dataset:DatasetSpecificationForCollectionSet. The set of dataset:DatasetSpecificationForCollectionSet objects determines the set of fundamental data that is to be used. This adds a new dimension to versioning. If two fundamental data sets of a given type are considerably different from another, they would reside in different collections in the database. The dataset:DatasetSpecificationForCollectionSet would point to which of these sets should be used. Since both sets are still simultaneously in the database, their effect on the calculation can be compared by specifying one set for the first calculation and the other for the second calculation. For example, consider the set of Benson rules. One set could be the set of Benson rules that were derived directly from Benson's tables. Another set could be updates to these rules with, for example, recent quantum mechanical calculations. The effect of the updates could then be compared by first specifying the calculation with the original values and then perform another calculation with the updated values.

The information within dataset:DatasetSpecificationForCollectionSet identifies the set of fundamental data within the database hierarchy of documents and collections. The components in this object are used to create the FirestoreID of the fundamental data collection. This compound data object consists of the following components:

dataset:CatalogDataObjectMaintainer: This is the UID of the creator and/or maintainer of the fundamental data object. This could also reflect that the data is 'system' data maintained by the system administrator (this is the data that is used for the guest users, for example). The label could also be the consortium name meaning that any user in the consortium could have created the data.
dataset:DatasetName: As described in the previous section, this is a label which indicates a similar group of data within the fundamental data.
dataset:DatasetVersion: Within a given group with the same dataset:DatasetName this is another way to label an update to the fundamental data.
dataset:CatalogDataObjectStatus: This label, at the bottom of the hierarchy of database, denotes the status of the collection of database object. Those that are being actually used should be labeled: dataset:CatalogObjectStatusCurrent.

Google Sites

Report abuse