Background: Immunoglobulins are a class of proteins that are present in the serum and cells of the immune system and function as antibodies. Any given antibody has both a conserved region of amino acids and a variable region. The variable region is the driver of antibody specificity to antigen and other targets. This ability to adapt and make new targets is the fundamental biological principle behind adaptive immunity and is involved in fighting infectious and chronic diseases.
Methods: Datasets were taken from human proteome assemblies aggregated by the UniProt Swiss database. A Hidden Markov model (HMM) was generated using the software package Hmmer. Using the pfam database and the immunoglobulin family, a HMM was built and sequences were predicted for the entire human (hg38) proteome. Phylogeny of predicted sequences were generated using Clustal Omega and PHYLIP software. Clustering analysis was done with the Morpheus program. Antigen screening was done by reconfiguring a generalized tool for biological sequence discovery called Kangaroo.
Results: Based on initial exploratory analysis, it was clear that the regions predicted from the HMM had vary sparse clustering trends (based on amino acid sequence). After exploring the phylogeny constructed by the predicted sequences, clusters were analyzed for similar antigen binding targets. Based on this analysis, it became clear that certain groups of predicted sequences were predicted to bind to similar antigens. Using results from this analysis, new immunoglobulin targets and domains can be evaluated.