Exploratory data analysis
Figure 1: A hierarchical clustering analysis of the predicted gene sequences using Morpheus. Clustering criteria was amino acid frequencies.
Figure 2: A multiple sequence alignment of the predicted Ig gene segments using Clustal Omega
Figure 3: A predicted Phyla of the gene segments using Clustal Omega.
The clustering of amino acid frequencies in Figure 1 did not reveal any helpful information. This was expected as the variable regions (which were predicted) vary in amino acid sequences. The multiple sequence alignment showed similar unhelpful results. A predicted Phyla indicated the same regions (labeled here as isoforms) group together. This is expected because similar regions in the domain should have similar amino acid sequences. Thus only one isoforms is needed due to such sequence similarity.
The above table is an example of the organization of scraping protein names for the gene regions in which predicted immunoglobulin domains were found. Names of predicted antigen sequences are also shown. This information can be used to see if:
1) The HMM predicted known immunoglobulin domains. See if these known immunoglobulin domains had novel antigen targets.
2) See if there are new predicted immunoglobulin domains and see what their targets are. Potentially compare to structures and look for domains in 3D (if structure exists).
Antigen prediction results
Figure 4: A predicted Phyla of the gene segments with potential binding antigens using Clustal Omega.
Figure 5: A predicted Phyla of the gene segments with potential binding antigens using PHYLIP.
Figure 6: A portion predicted Phyla of the gene segments with potential binding antigens using PHYLIP from Figure 5.
The best isoform was then screened against a database of potential antigens with a sequence-based protein-protein interaction tool. The gene segments were re-clustered with the predicted binding antigen in the label. The gene segments that are predicted to bind to the same antigen seem to group together.
Significance:
The ability to predict immunoglobulin domains can be applied to pharmaceutical research or molecular biology research. These results show that similar sequences bind to similar targets, this can then help enable manipulation of antibodies to increase affinity. Simple manipulation of the sequence may generate more antibodies for one target, which is desirable in molecular biology research. This process could hopefully lead to synthesizing novel antibodies, saving costs and production time.
Future Work:
Using this data, genes that are known to have immunoglobulin domains can be filtered out. Uncharacterized gene that have potential immunoglobulin domains could then be further investigated in-vitro for binding activity. This information could prove invaluable to understanding the conversations that are crucial to an immunoglobulin domain.