JTHERMODYNAMICSCLOUD
Thermodynamics from 2D molecule descriptions
More up to date information (these pages are being used for notes and slowly being phased out)
Thermodynamics from 2D molecule descriptions
More up to date information (these pages are being used for notes and slowly being phased out)
“It’s not the destination, it’s the journey” is a quote famously attributed to Ralph Waldo Emerson the American philosopher.
Loosely applied to scientific investigation, scientists never accept just the 'destination', meaning the result, without fully understanding and justifying the journey to that result. A result is never accepted at face value, how the result was obtained is on equal ground. Applied to database management, each individual piece of data has a journey, a history, of how it was achieved. Though the data could be accepted and used at face value, knowing its history adds value to the data. This becomes even more important when the data is the basis of the derivation of further data as in the quote from Thomas Ried: “In every chain of reasoning, the evidence of the last conclusion can be no greater than that of the weakest link of the chain, whatever may be the strength of the rest.” To have confidence in the final result, the researcher has to have confidence in all the pieces that were used to create the final result.
In database management this *journey' of data is the key to the concepts of data lineage, provenance, traceability and transparency(Werder et al 2022., Mayernik et al 2013 and Viglas 2013). Data lineage and data provenance focus on the history of data through its transformations and sources lending credibility to its authenticity. Having this information contributes to the traceability and transparency of data management.
A motivating characteristic of JThermodynamicsCloud, that goes beyond its immediate use as a thermodynamic property calculator, is promoting data lineage, provenance, traceability and transparency through its governing philosophy of data management. The underlying data management system on which JThermodynamicsCloud is built is generally applicable beyond the thermodynamic and chemical data that is needed for thermodynamic calculations.
One of the motivations of using the Firestore Firbase noSQL document-based database schema (see Section Database Structure for details) is its scalability. This means efficiency is not sacrificed when scaling to large amounts of data. The scalability enables the storage the large amount data that could be incur with not only storing the primary fundamental data used for the calculations but also all related data, such as sources, modifications and transformations. When creating the fundamental data, all source data and intermediate transformations are stored with links to the final data (see SectionSource to Fundamental Data for details). When modifications are made to the data the original is moved and the modified version is stored as a replacement and a link to the original. If a piece of data is eliminated, it is just moved and linked as a historical reference. These links between historical and current data are the cornerstones of promoting data lineage and data provanance.
In addition, there is no restrictions on copying and saving different version of data. Different versions of similar data can be kept for comparisons. A user can copy and personalize the system data with the users own modifications and versions. With the concept of dataset collections (see Section Dataset Collection for details) different versions of fundamental data can be combined to study the effects of modifications.
An integral part of stored catalog data (see Section Catalog Objects for details) are links to other catalog objects (such as historical objects leading to the current version), web locations and bibliographic information. This promotes accountability and traceability of data.
JThermodynamicsCloud, as its predecessors, JThermodynamics, JTherGas and TherGas, use a dataset of fundamental data, such as Benson rules, the HBI method, symmetry corrections, steric corrections, ring strain and nearest neighbor interactions, to calculate the temperature dependent thermodynamics from 2D-graphical species representations of molecules and radical species needed for combustion modelling of (large) reaction mechanism, both hand and machine generated. JThermodynamicsCloud differs from previous versions in that it is cloud-based (SaaS) with improved data-management that promotes FAIR data practices. Of particular note is that all database updates can be traced to source data and different sets of fundamental data can be configured for the final calculation.
The calculational methodology has its roots in Benson method and, in the case of both Jthermodynamics and JThermodynamicsCloud, calculations using HBI structures from Bozzelli. The representation of the final output is generally using both the value of enthalpy and entropy at the standard temperature of 298 degrees K and then having, essentially, a polynomial representation of the temperature dependence of heat capacity. The temperature dependence of the heat capacity can also be represented, as is standard in Benson's book, as the values at several values over the desired temperature range. The temperature dependent heat capacity values can then be used to calculate the temperature dependence of the enthalpy and entropy.
The features, particularly for the user, are the following:
Cloud-based (SaaS) Implementation: Through the cloud (SaaS) interface accessed from any browser, the user can perform all the temperature dependent thermodynamic calculations of molecule and radical species using THERGAS method or the THERM method using the standard database provided. Being in the cloud means that no software needs to be installed.
JSON Data Objects: All data within JThermodynamicsCloud is represented as property-value pairs. Associated with a label, the property name, is a value. The value could either be a simple object like a number or a string, or a compound object consisting of a nested set of property-value pairs. This type of data representation can be considered to be a JSON object.
Ontology Based: JThermodynamicsCloud is an ontology-driven application. Though many of the uses of the ontology are in the background, the user can see the results of the use of ontologies through the data characterizations, specifically, for example, the labels used for data and parameters and through the labels used in the user interface. The ontology is used as a meta descripton of data.
Fundamental Data: The user can view all the fundamental data that is used to perform all the calculations. The user can also apply individual sets of fundamental data to a species to see the corresponding contribution.
Parameter Units: Each numeric data point has a unit specification. There are no 'standard' units within JThermodynamicsCloud for the numeric quantities. The specific units are specified during input and the output. For input, this means that the quantities can be given to the system in the original form (for example, which is often the case, in Joule or in Calorie based quantities). Keeping the original units facilitates error checking, not only from input, but also in the usage of the data. The units of the output can also be specified. The definitions of the units are described by an ontology
Dataset Collections: A dataset collection defines the set of fundamental data to be used for a thermodynamic calculation. To calculate, the dataset collection, i.e. the specific combination of fundamental data, is chosen and that data is used to make the calculation. In this way, different databases can be compared. For example, the user can two copies of a dataset where one has modifications or new data. The effect on the calculation of the modifications or new data can compared.
Versioning through Dataset Collection: JThermodynamicsCloud is a data-driven application where the individual pieces of fundamental data making up the final calculation are being constantly added, modified and updated. A set version of fundamental data can be created and then this version is then referenced through the dataset collection. Through the concept of Dataset Collections, JThermodynamicsCloud allows the database manager to keep track and even compare different versions of the fundamental objects making up the thermodynamic calculations.
Traceability and Accountability: An important design feature of JThermodynamicsCloud is tracing every piece of information associated the thermodynamic calculation to its origins. This includes not only bibliographic references, but also the trace of transactions from original data source (including the data manager) , through its transformations and modifications (including updating) to the final thermodynamic correction.
Configurable: Different configurations of the database can be set up with different combinations of fundamental data. This is useful for comparison if new fundamental data is introduced and the results, hopefully improvements, can be examined.
Authentification: JThermodynamicsCloud can be accessed as a 'Guest' where the user can perform thermodynamic calculations and visualize the fundamental data of the systems databases. Using JThermodynamics authentification, the user can also log in to the system with a validated email (through Google, Facebook, Github or with a passward). As a logged in registered user, the user can create and modify the user's own database.
User's own database: If the user is logged into the service, the user has the ability to create the user's own database and perform calculations. The user can copy the standard database(s) available to the system and modify them as they need. The user can read in input files that can replace, modify or even create new sets of fundamental data. User can modify and build their configuration of fundamental data for their own purposes, for example, new structural information, or testing modifications.
FAIR data practices: JThermodynamicsCloud has been build with FAIR data practices in mind, meaning that all the associated data can be Findable, Accessable, Interoperable, and Reusable.
JThermodynamicsCloud is a third generation application estimates temperature dependent thermodynamic information from two dimensional graphical representations of molecules and radicals based on the Benson additivity method and Bozzelli'sHydrogen Bond Increment method. JThermodynamicsCloud, has several distinguishing features.
There are several software technical features of JThermodynamicsCloud that are working 'behind the scenes' to enhance the features available to the user: Some of these state of the art features also define the driving forces of the implementation and represent how the same features can be applied to future applications.
Ontological Representation: A unique and important feature of JThermodynamicsCloud is that all concepts and data structures are represented as ontological objects. These definitions give not only documentation of the objects, but also context through the ontological hierarchy. The entire software implementation, from the database structure, software object definitions, function definitions, to the user interface is driven by the ontology.
Database Information: All the information needed for the calculation, such as benson rules, HBI structures, symmetry corrections, steric correction and ring corrections, are stored in a NoSQL document store database, where each 'document' is a database structure representing the information needed to perform the thermodynamic calculations. Each document structure, meaning each piece of fundamental data not only has the information needed to perform
JsonObject representation: All data objects are represented as JsonObjects which are essentully a nested hierarchy of property-value pairs. There is a one-to-one correspondence of all JSON data objects and definitions in the ontology. The ontology not only defines the JSON object, but also gives documentation through annotations and context through the objects position in the ontological hierarchy. JSON data objects are the 'documents' stored in the noSQL database. The information within the ontology definition is important for JSON objects because the JSON structure is fully non-typed.
RESTful Services: All the operations of the application are applied through RESTful services, this includes the web user interface. The restful services are divided into two types, transactions, which make modifications to the database and simple services which access the database. The services represents the 'controller' interface in the Model-View-Controller implementation design.
2D-graphical lewis structures: A concept introduced with JThermodynamics is the representation of 2D-graphical (Lewis) structures in all representations of the corrections, from Benson rules, HBI structures, symmetry corrections and steric/ring corrections, associated with the calculation of the thermodynamics. A correction is applied if the correction substructure is matched within the molecule. The details of how the correction is applied is associated with the table of information connected with the substructure. This design is the the key to being able to store all corrections in the database, as opposed to having the correction hard-coded in the implementation.
JThermodynamicsCloud is an use-case of more general design principles used in its implementation:
Ontological Description of Data Objects: JThermodynamicsCloud is a use-case application of the general principle of CHEMCONNECT (see also presentation at KEOD 2021) where all data objects are described by ontology classes. The ontology gives not only context of the object, through its position in the hierarchy, but also further documentation through annotations. The data objects are an extension of the DCAT ontology, where the catalog objects are subclasses of the dcat:catalog objects which are made up of dcat:records and dcat:Components. The ontology can also provide specific instance configurations of a more general data objects.
Ontology Driven Implementation: JThermodynamicsCloud use of ontologies goes beyond data structure definitions. The ontology also describes, for example, the RESTful services, interface menus, classifications and operations on classitifications. For example, a typical interface interaction is to query the ontology for the menu objects and the result is the identifier along with a user friendly label and a longer 'comment'. This design has two major purposes. First the ontology object has a defining context due to its position in the ontology hierarchy and further documentation through annotations. The second purpose is ontology-software interaction is
Database Hierarchy: In a document driven noSQL database such as firestore, the documents, which are the catalog objects of the application, are defined within a hierarchy. The ontology defines where objects of a specific type are to found in the hierarchy and how the nodes of the hierarchy are to be labelled. This configuration is application independent.
Dataset Collection: This is a general solution to problem where many sets of fundamental data must be used to perform a task. The dataset collection is an accounting and versioning tool where the user can keep track of which sets of fundamental data should be used. Several dataset collections can be defined so the effect of different combinations of fundamental data can be compared.
JThermodynamics, originally based on the Benson rule implementation of THERGAS, makes several additional corrections to the Benson rules based on symmetry, steric influences, cyclic influences, gauche influences. The calculation of radicals is done through the difference between the radical specifies and its non-radical base form.
Though the calculation made by JThermodynamics is fundamentally the same as THERGAS, it differs in one very important principle. JThermodynamics uses a database of structures and their contribution to the thermodynamics as opposed to hard-coded rules. This gives an added flexibility and ability to expand the database of thermodynamic information and corrections.
The general principle is that each contribution has a structure, usually, 2D-graphical representation of a molecule.. Recognizing, usually through subgraph isomorphism, that a structure is found within the molecule means that the corresponding thermodynamic addition/correction can be applied. With each structure, there is a table of symmetry and thermodynamic information.
The type of structures within JThermodynamics are structures representing:
Benson Rule structures
Steric interactions
Cyclic ring strain
Symmetry of the molecule, both internal and external
Optical isomers
Vibrational moments.
Disassociation Energies
One of the second order corrections involves the automatic recognition of molecular symmetries: internal and external symmetry and optical isomers. Symmetry is inherently based on the three dimensional structure. However, the information available to JThermodyanics is a molecular graph. The three dimensional information has to be inferred and assumptions have to be made. This is a design decision that has to be made when creating the database for symmetry structures.
One of the second order corrections involves the automatic recognition of molecular symmetries: internal and external symmetry and optical isomers. Symmetry is inherently based on the three dimensional structure. However, the information available to JTHERGAS is a molecular graph. The three dimensional information has to be inferred and assumptions have to be made. This is a design decision that has to be made when creating the database for symmetry structures.
The generalized procedure can be said to be:
1. Match the symmetry structure within the target
2. For each match, identify the groups attached to each of the unspecified atoms (labeled R in the example).
3. Identify which ligands are the same and group them together.
4. Identify the symmetry of each group or whether the group is linear or not.
5. Condense this information into a table of groups where each group has the number of equivalent ligands and the symmetry or linearity of ligand.
The method of calculating radical thermodynamics in JTHERGAS is based on the difference between the radical and the parent molecule with a hydrogen added. The full Benson method is applied to the parent molecule (Benson rules with corrections). For the radical only the corrective terms are calculated:
1. Internal Symmetry
2. External Symmetry
3. Optical Isomers
4. Steric Corrections
These are calculated for the radical in the same way as for the parent molecule. The difference of these corrections are added into the final result. In addition, for the radical species the following are estimated:
1. Dissociation Energy
2. Loss of Vibrational modes
3. Loss of Rotational modes
The vibrational and rotational modes are translated to corrections to the temperature dependent heat capacities and to entropies. In this implementation, the loss of rotational modes is neglected. This analysis will appear in later implementations.