Key to flexibility
The key to the flexibility and the ease of adding new chemical information to the database is the use of substructures to identify and count structures within a molecule.
JThermodynamics uses the 2D-graphical representation to identify molecules and the substructures within the molecule. Several questions can be asked:
Does the molecule have the substructure?
How many times does the molecule have the substructure?
Where is the substructure within the molecule?
To answer the first question, Does the molecule have the substructure?, if any match is found, then the answer is yes. However, to count the number of times a substructure is found in the molecule depends on the combinatorics of identical atoms within the substructure and the molecule. For example, matching a methyl substructure to a hydrocarbon, due to the identical hydrogens, there are 6 ways to match one methyl group in a primary methyl in a hydrocarbon. For this reason, if the number of structures is the question to be answered, the number of matches has to be divided by some factor (this is explained in the next section). What this factor is depends on the relationship between the structure itself and what wants to count. For example, if the substructure is a methyl group, the dividing factory (symmetry factor) depends on what one wants to count:
1: The total number of matches. In this case 12/1=12.
2: The total number of hydrogens. In this case 12/2=6
6: The total number of primary methyl groups found. In this case 12/6=2.
count > 0: There was a match somewhere in the atom.
Due to the combinatorics of identical atoms within the 2D-graphical representation of molecules and substructures, the number of ways a substructure can match depends on the number of identical atoms. The combinatorics involving these atoms determines how many matches can occur.
For example, the GeneralPrimaryCH3C structure:
<molecule id="GeneralPrimaryCH3C" xmlns="http://www.xml-cml.org/schema">
<atomArray>
<atom id="a0" elementType="C" formalCharge="0"/>
<atom id="a1" elementType="C" formalCharge="0"/>
<atom id="a2" elementType="H" formalCharge="0"/>
<atom id="a3" elementType="H" formalCharge="0"/>
<atom id="a4" elementType="H" formalCharge="0"/>
</atomArray>
<bondArray>
<bond id="b1" atomRefs2="a1 a0" order="S"/>
<bond id="b2" atomRefs2="a2 a1" order="S"/>
<bond id="b3" atomRefs2="a3 a1" order="S"/>
<bond id="b4" atomRefs2="a4 a1" order="S"/>
</bondArray>
</molecule>
describes a CCH3 primary carbon and the three hydrogens, a2, a3 and a4, are identical. When matching within a primary carbon on a hydrocarbon, the three identical hydrogens of the GeneralPrimaryCH3C structure matches the three identical hydrogens of the primary carbon of the hydrocarbon. Due to combinatorics, three identical items, for example, a2, a3 and a4, can be matched in 6 ways to another set of identical items, for example, b2, b3 and b4:
(a1,b1), ( a2,b2) , (a3,b3)
(a1,b1), ( a2,b3) , (a3,b2)
(a1,b2), ( a2,b1) , (a3,b3)
(a1,b2), ( a2,b3) , (a3,b1)
(a1,b3), ( a2,b1) , (a3,b2)
(a1,b3), ( a2,b2) , (a3,b1)
Thus, for example, counting the number of assignments of GeneralPrimaryCH3C in any non-branched linear alkane, would result in 6*2=12 assignments. The symmetry factor reduces this number to the desired counting. The symmetry factor reflects what is being counted:
1: The total number of matches. In this case 12/1=12.
2: The total number of hydrogens. In this case 12/2=6
6: The total number of primary methyl groups found. In this case 12/6=2.
count > 0: There was a match somewhere in the atom.