Archaea are characterized by their unique membrane lipids forming the hydrophobic core of the cell membrane. These membrane lipids are composed of isoprenoid sidechains stereospecifically linked to glycerol phosphate backbones(Figure 1). Atypical characteristics of these lipids distinguish them from bacterial and eukaryotic membrane lipids: isoprenoid, not fatty acids, sidechains; ether instead of ester bonds joining sidechains to the backbone; and stereochemistry of this backbone is sn-glycerol-1-phosphate, not sn-glycerol-3-phosphate. The biosynthesis of isoprenoid compounds depends on isoprenyl diphosphate synthases, which catalyze consecutive condensations of isopentenyl diphosphates with allylic primer substrates to form linear backbones for all isoprenoid compounds. Different isoprenyl diphosphate synthases determine the stereochemistry of the newly formed bonds and the final length of the isoprenoid products. The precursor of the archaeal isoprenoid sidechains, isopentenyl diphosphate (IPP) and its isomer dimethylallyl diphosphate (DMAPP), is synthesized through the mevalonate pathway similar to other isoprenoids in eukaryotes and some bacteria. However, most archaeal species lack the last three of the mevalonate enzymes: phosphomevalonate kinase (PMK), mevalonate diphosphate decarboxylase (MDC), and isopentenyl diphosphate isomerase (IDI1), whose function in the conversion of phosphomevalonate into IPP and DMAPP are replaced by the isopentenyl phosphate kinase (IPK) and an alternative isopentenyl diphosphate isomerase (IDI2)[2].
Mutational studies have demonstrated the extent of variety of isoprenyl diphosphate synthases corresponding to even the similar length of the isoprenoid products[4]. Therefore, the phylogenetic tree of the isoprenyl diphosphate synthases would provide much information for placing the unique archaeal isoprenyl production in an evolutionary context.
According to the Genetic Core of the Universal Ancestor, the majority of the universal genes are likely to play more fundamental roles in cellular processes[5]. Most of them were found to deal with RNA and DNA transcription, with the others mostly dealing with energy generation. The divergence of membrane lipids might suggest that the housing of the cell may have evolved separately and much later than would normally be assumed. Thanks to the significant accumulation of genome sequence data for a wide variety of archaea species, we can analyze the evolutionary relationship of archaeal isoprenyl diphosphate synthases without requiring labwork.
Figure 1 Comparison of Bacterial and Archaeal phospholipids[1]
Figure 2 Biosynthesis pathways of phospholipid components in archaea. [3]
Figure 3: Unrooted Radiant Phylogenenetic tree of Archaea, Bacteria, and a couple Eucaryotes where relative distance indicates less similarity between the genes.
The initial unrooted phylogeny of these organisms already begins to give insight into the use of the isoprenoid backbones. As archaea have their unique chemical structure for their membrane the geranylgeranyl diphosphate synthase protein or bifunctional short chain isoprenyl diphosphate synthase in prokaryotes, demonstrates the high similarity between archaea which dramatically shoots away in this radiative phylogenetic tree (Figure 3). After finding the root for the tree (Figure 4), a Maximal Likelihood algorithm was used to generate plausible ancestoryal sequences. Through this contrast it was found that archaea likely shared a common ancestor with the following DNA sequence:
GTGGAACTTATTGATAAATTAAAGGAATATTCTAAAATAGTTGATGAAGAAATAAAAAAATTTATAAAAGAAAAAGAACCTGAAAAACTATATGAAGCATCAAAACATCTAATAATAGCTGGTGGAAAGAGAATAAGGCCATTCTTAGTACTATTAACTTCTGAAGCAGTTGGTGGTGATATAGAAGAAGCTCTTCCTGCAGCAGCTGCTGTAGAACTAATTCACAACTTCACCCTAGTTCATGATGATATAATGGATAACGATGAGATAAGAAGGGGCAAGCCAACAGTTCATGTAGTATGGGGTGAACCAATGGCAATTCTTGCTGGAGATGTGCTATTTGCAAAGGCTTTTGAGGTAATATCAAAAATTGAAGTAGATGCTGAAAGAGTAGTTGAAGTTTTAGAAGTTCTTACAAAGGCTTCTGTCGAGGTTTGTGAAGGACAGGCATTGGATATGGAGTTTGAAAAGAGAGATGAAGTTACAGTGGAAGAGTATCTGGAGATGATTAACAAGAAGACAGCAGCACTCTTAGAAGCTTCTGCTAAGATTGGTGCAATAATTGCAGATGGTAACGAAGAGGAAATTAAAGCTCTATAAGAATATGGAAAAAACATTGGAATAGCATTTCAGATACAGGATGATTTTCTAGATCTTATAGGAGATGAGAAAGAACTAGGAAAACCCGTTGGAAGCGATATAATGGAAGGTAAGAAGACACTAATAGTTATTAACGCATTAAAAAATGCTAATGAAGAAGAAAAGAAAAAATTACTAAAAATTTTAGGAAATAAAGATGCTGACGAAGAAGAAATTAAAGAGGCAATTGAGATATTTAAGAAATCTGATTCCATAGAATATGCTAAAGAAATAGCTAAAGAATATGTTGAGAAGGCTAAAGAACACCTAGAGGTTCTTGAAGATAACGAAGCAAGAGAAGTATTAAAAGATTTAGCTGATTTTATAGTAAAGAGGAAATATTAAFigure 4: Traditional Phylogenetic tree of Archaea for the analyses and calculation of ancestor gene composition.
Phylogeny is one of the more interesting tools which allows for the recontextualization of a species into finding its place in the collection of evolutionary relationships shining new light on how similar the machinery is between all organisms. As was discussed in the introduction this was motivated by trying to place the isoprenyl production that uniquely makes up archaeal membranes in an evolutionary context to see where it may have come from. This result provides some surprising conclusions. As many of the eukaryote are much more closely tied to archaea then bacteria, an assumption that the 2 compared eukaryotes would have greater similarity between their geranylgeranyl diphosphate synthase which in archaea is referred to as bifunctional short chain isoprenyl diphosphate synthase. This turned out to be difficult to make a worse prediction. As can be seen in Figure 3 with the dramatically long tail to Homo sapiens and the amoeba Entamoeba histolytica. This indicates that there is a larger connection between bacterial metabolism and archaeal membranes then to their closer genetic relatives in Eukarya. This does make some sense as even though archaea and eukaryotes share much of their 16s gene, the eukaryotic membrane does not have the cyclic isoprenyl.
Figure 5: Archaea 16S ribosomal RNA phylogeny[7]
Figure 6: Initial "Gap" in aligned genes which propagated errors
The other area that was explored in this experiment was the prediction of ancestral gene sequences. Using the Maximum Likelihood algorithm, it was possible to work backwards from the archaea to generate most likely sequences. These clearly had several issues as the error compounds dramatically to the point where the likelihood for this sequence being correct was so small it was incalculable with the built-in error analyzer being set to 0. This is not necessarily of concern as the value should be extremely small and to improve this a new plugin will have to be coded for arbitrary precision. The largest problem with the dataset that caused issues is when aligning the nucleotide coding sequences they did not align along the start codon which accounted for some weird errors when propagating the values forward and is the reason why the theoretical DNA sequence starts with a “GTG” instead of an ATG. This is doubly unfortunate due to the computer suggesting that the probability of starting with a G is over 90%.
This project demonstrates some of the interesting ways that protein homologues tell us about our less well-known connections and suggests some interesting areas to continue. The simplest way to improve this is to continue collecting more nucleotide sequences to create a more robust tree with more eukaryotes and bacteria to further reinforce the connections between the two. The next most pressing option is to Blast more proteins from the isoprenyl backbone biosynthesis using multi sequence Blasts as the current experiment only looked at one central protein it may be that peripheral proteins that are moderately conserved may indicate more accurately the evolutionary history of the organisms. As mentioned the primary way to improve the sequence generation is to reintroduce the error calculation to get a better understanding of the relative significance.
First an Archaea Gene was selected to be used to find homologous proteins through BLAST. The Archaea chosen in the case was Methanocaldococcus jannaschii's bifunctional short chain isoprenyl diphosphate synthase (AAB98865). A protein Blast was preformed ignoring model proteins and uncultured proteins to get a varied set of species with well characterized homologous proteins. The Coding Nucleotide Sequence was found for each of these genes and added into MEGA a phylogenetic Tree analysis software. After collecting a large enough sample size of archaea, bacterial and eukaryotic homologous proteins, these were aligned using MUSCLE (Multiple Sequence Comparison by Log-Expectation). Once aligned the proteins were then analyzed to find the optimal model for generating a phylogenetic tree. In this case it was General Time Reversible model with a Gamma distribution with 5 Gamma Categories. This was then used in a Maximum Likelihood statistical method to generate a phylogenetic tree.
Once the Tree was generated it was necessary to find a proposed root of the tree which in this case required finding an outgroup as maximum likelihood does not find a root. By looking at figure 3 it was decided that the Eukaryotes and Archaea where significantly removed and were classified as the outgroup leading to the selection of a root between the archaeal branch and the others. From here the Maximum Likelihood method combined with the phylogenetic tree allowed for the prediction of a genetic sequence based on propagating the nucleotide sequence of close relatives back.
Supplemental Information:
Species and Gene Sequences used:
https://drive.google.com/file/d/183S2WUEnOZW4tmK7DDHPCOFBJNZ1Ow6E/view?usp=sharing (text file)
https://drive.google.com/file/d/1OtsLz6SvTm1qFBTCjGIlQgv34m4KHGf7/view?usp=sharing (Mega's loadable file type [.meg])
Common Ancestor Proteins:
https://drive.google.com/file/d/13yeN1BKCH189wD7vzsYZYMhsApfK2DMj/view?usp=sharing