To conduct an analysis on different genes that increase an individual's risk for Alzheimer's we ran a clustering software called Morpheus on a dataset with gene sequences. The clustering was done to group the data in such a way that genes and amino acids in the same group are more similar to each other than to those in other groups. The algorithm by which the genes were clustered was the one minus Pearson correlation on an average basis by both the genes and amino acids. The output from the analysis is shown in the image below. The heat map is colorized so that a high frequency of an amino acid is shown in red and a low frequency is colored in blue. The genes did not cluster in a significant manner as can be seen in the heat map below.
Based on the heat map output from the clustering it can be seen that most of the genes that are known to increase the risk of Alzheimer’s disease are lacking cysteine (as indicated by the line of dark blue). This is an important finding that will be addressed after looking more in depth at the APOE protein.
Again, looking closely at APOE it was found that the gene exists in three iso-forms. A set of protein iso-forms may be formed from alternative splicing or other post-translational modifications of a single gene The difference between the iso-forms occurs at two residues, 112 and 158. As can be seen in the image below, at residue 112 and 158 APOE-e2 contains two cysteine, APOE-e3 contains a cysteine and an arginine, and APOE-e4 contains two arginines.
The differences in these positions change the overall structure of the iso-forms and affect their ligand binding abilities. The iso-form APOE-e4 is an important risk factor for Alzheimer's patients. This is due to the fact that APOE-e4 has arginines at the two residue sites instead of cysteine. Cysteine is strong ligand to zinc , this metal helps proteins fold properly and stabilizes protein structures. Because these proteins lack cysteine it reduces their affinity to bind to zinc and makes the proteins less stable - this is known to be true with the gene APOE-e4. This destabilization results in less full length APOE protein and more APOE fragments in the brain. APOE fragments present in AD brains induce neurofibrillary tangles in neurons. The exact role that these fragments play in creating these tangles is still unclear but it is known that due to increased APOE-e4 fragments, AD onset/development is more likely in APOE-e4 carriers.
Pair-wise sequence alignments are performed in efforts to identify regions of similarity that may indicate functional, structural and/or evolutionary relationships between two biological protein sequences. Specifically, a local alignment tool - EMBOSS Water - was used to find the local alignment of two sequences (of Human and Mouse APOE genes). Aligned sequences of amino acid residues reveal which amino acids are conservative, semi-conservative, or non-conservative. Overall, our pair-wise sequence alignment answers the question, "How similar are our two sequences?", by finding the optimal alignment and scoring the similarity between two different sequences.
#=======================================
#
# Aligned_sequences: 2
# 1: HUMAN
# 2: MOUSE
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 318
# Identity: 226/318 (71.1%)
# Similarity: 263/318 (82.7%)
# Gaps: 10/318 ( 3.1%)
# Score: 1146.0
#
#
#=======================================
An InParanoid cluster is an ortholog group, seeded by a reciprocally best-matching ortholog pair. Using pairwise similarity scores, an orthology group is constructed between two complete proteomes.
When more sequences are added to a group, if there are sequences in the two proteomes that are closer to the corresponding seed ortholog than to any sequence in the other proteome, this creates an InParalog (co-orthologs to one or more orthologs in another species). Clustering of InParalogs together allow for proper classification of one-to-one and many-to-many orthology cases.
Phenologs are the orthologous phenotypes between organisms based upon overlapping sets of orthologous genes associated with each phenotype. Phenologs create a framework to compare mutational phenotypes, identify adaptive reuse of gene systems, and suggest new disease genes. Phenotypes of Human and Mouse orthologs for the gene identifier 348 (APOE) were found:
Is there any indication of protein mis-folding phenomena?
Yes, especially in the e4 version of the APOE gene as there are amino acid frequency disparities in that the APOE-e4 has arginines at two residue sites (112 and 158) instead of cysteine. This in turn alters the secondary structure of the protein.
Are there any similarities and differences across different species' gene sequences?
Yes, as seen by our pair-wise sequence alignment and phenologs, we are able to compare and contrast varying species' gene sequences (specifically for APOE) and record evidence of functional, structural and/or evolutionary relationships.