Computational Work
Lead Contributor for Webpage: Hongji Zhu
Lead Contributor for Webpage: Hongji Zhu
Computationally, a pipeline has been developed that takes a query lectin sequence and runs it through different alignment algorithms to determine if its N-glycosylation sites are conserved.
The lectin database is created by all available protein sequences which are identified as carbohydrate-binding form Uniprot.
From Uniprot, for each lectin, we extracted the information on its glycosylation identity, locations of the glycosylation sites, and the sequence of lectin.
Only a part of the lectins have known binding partners, and only a part of the lectins with known binding partners have available sequence information. Based on that, we gathered a list of lectins that both have available sequence information and known binding partners for our next steps of analysis: A0A0R4I963, A0A1D5B395, A0A1D5B396, O22415, O24313, P02866, P02867, P02871, P02873, P05087, P0DKL3, P11218, P14894, P15231, P16270, P16300, P18670, P19329, P19330, P19664, P22972, P22973, P24146, P33183, P56625, P93248, P93543, Q12558, Q2F1K8, Q2U5P7, Q2UDJ8, Q2UQV7, Q2UT06, Q40987, Q41114, Q41159, Q41160, Q41162, Q41358, Q42372, Q42460, Q43629, Q8W1R6, Q9AVR2, Q9S8M0, Q9SM56, U3KRF8
For lectins in the list, we extract the ±10 amino acid sequence around each glycosylation sites of these lectins.
We ran BLAST on each of the sequons against the lectin database with e-value = 0.05. From the BLAST result, we looked into each of the alignment results and see if the aligned sequence from the database has Asn on the location of the given mini-sequence query and check if the match also follows the glycosylation pattern of AsnXxxSer/Thr/Cys. If satisfied, we count that alignment as a “Match”. Otherwise, we count that alignment as a “Mismatch”. However, the total number of Match/Mismatch may also be affected by the number of glycosylation sites and fail to reveal how conservative the glycosylation sites of the lectin among the lectin database. Therefore, we computed the average number of match/mismatch per glycosylations sites.
Our goal for the computational analysis is to identify the lectins that contain conserved/non-conserved. So that we can test whether the removal of the glycosylation sites would affect the binding function of the lectin. Thus, we looked closer at individual lectins and see how many matches/mismatches for each glycosylation site of that lectin.
To gain orthogonal information on the physical location of glycosylation sites, we use Pymol, a Python library that allows us to visualize protein structures and annotate structural motifs. To do this we format the sequence for a given lectin as a protein databank file (PDB) then tell the program to annotate motifs of interest in the 3D structure. In our case, we annotated glycosylation sites.