In this project, we have developed a comprehensive structural bioinformatics pipeline to analyze the active site environments of three serine proteases families. A curated dataset of serine protease structures was obtained and used to identify residues within an 8 Å radius of the catalytic serine. The major conclusions and findings are as follows:
Conserved GLY-rich scaffold around the catalytic serine, confirming the oxyanion hole
Positional Conservation
GLY is conserved in multiple positions within [-3, 3] in almost every structure
THR precedes the catalytic in each of the analyzed SB structures
TRP at position -1 appears in 5 of 11 analyzed SC structures
Conserved Core, variable periphery
Hierarchical Clustering and PCA analysis show families share a conserved catalytic core but diverge in their periphery
Distinct 3D active site organization
PA shows many polar residues and tight coordination near serine, charged residues near the exterior
SB shows clustered MET, PRO, and GLY near the active site
SC is asymmetric, rigid environment enriched in PRO and TYR, with exterior TRP
Overall, this analysis displays serine protease evolution with convergence at the catalytic core and divergence at the periphery. This shows a shared mechanism built on the catalytic triad, but refined by periphery residues to determine substrate specificity.