The Basic Rules of SMILES
SMILES has a set of rules that allow it to describe complex structures. Let's cover the essentials.
Atoms
Atoms are represented by their standard element symbols (e.g., C for carbon, N for nitrogen, O for oxygen). For elements in the "organic subset" (B, C, N, O, P, S, F, Cl, Br, I), the attached hydrogens are usually implied. For all other elements, they must be enclosed in square brackets, like [Fe] for iron.
2. Bonds
Bonds connect the atoms in the string.
Single Bond: Represented by a hyphen (-), but it's usually omitted for simplicity. CC is the same as C-C (ethane).
Double Bond: Represented by an equals sign (=). Ethene is C=C.
Triple Bond: Represented by a hash symbol (#). Acetylene is C#C.
3. Branches
Branches on a molecular chain are enclosed in parentheses (). The branch is placed directly after the atom it's attached to.
For example, isopropanol (isopropyl alcohol) has a central carbon atom connected to another carbon, an oxygen, and a third carbon. We write the main chain CCO and place the branched methyl group in parentheses after the central carbon: CC(C)O.
Figure 1: Structure of isopropyl alcohol
4. Rings
Rings are handled by breaking one bond and adding a number to the two atoms that were connected. The same number indicates that those two atoms are bonded together, closing the ring.
For cyclohexane, we break one bond in the ring, creating a six-carbon chain. We then add a 1 to the first and last carbon to show they are connected.
SMILES: C1CCCCC1
You can use different numbers for different rings in the same molecule (e.g., 1 for the first ring, 2 for the second, and so on).
Figure 2: Structure of Cyclohexane
5. Aromaticity
Aromatic compounds, like benzene, are special. In SMILES, aromatic atoms are written in lowercase. So, benzene is simply c1ccccc1. This notation implies the alternating single and double bonds within the aromatic ring.
6. Stereochemistry
In SMILES, stereochemistry around double bonds (like E/Z or cis/trans) is indicated using the characters / (forward slash) and \ (backslash).
F/C=C/F (E-difluoroethene)
F/C=C\F (Z-difluoroethene)
The @ and @@ symbols are used in SMILES to indicate tetrahedral chirality, with @ representing an anticlockwise arrangement of the atom's neighbours and @@ indicating a clockwise arrangement. These symbols are placed after the atom label and specify the relative spatial orientation of the bonds around a chiral center.