In this analysis, we used AlphaFold2 to predict the structure of the first ~1650 residues of scospondin with high confidence. To our knowledge, this is the first time any structure of scospondin has been reported. Unfortunately, the last two-thirds of the protein did not fold with high confidence. This could be due to the presence of intrinsically disordered regions in the sequence, which AlphaFold tends to fold with low confidence (Ruff et al., 2021). Different computational tools have estimated an average of 20.3% disordered content in the human scospondin sequence, and this appears to be conserved across species (Alowolodu et al., 2016). Furthermore, the disordered propensity appears to be higher in the last two-thirds of the protein, in agreement with our analysis.
We also demonstrated that the stl300 C-878-S mutation likely disrupts an intramolecular disulfide bond with C-900. This was evidenced by (1) close proximity of C-878 and C-900 in the AlphaFold2 prediction, and (2) structural homology to proteins with conserved disulfide bonds. Disulfide bonds can provide structural integrity to proteins; therefore, the loss of this cysteine residue would likely have implications for the overall folding of scospondin and the subsequent formation of the RF. This could possibly explain why homozygous stl300 mutants progressively lose their RF with age. However, understanding why the loss of the RF leads to scoliosis is yet to be determined. Chains within fibronectins were the most abundant structural homologs to the region surrounding the stl300 mutation site. Interestingly, fibronectins are also large multidomain, extracellular matrix proteins that can polymerize into insoluble fibers (Singh et al., 2010). With this in mind, it could be useful to explore fibronectin biology to better understand scospondin structural and polymerization dynamics.
The current investigation has some limitations that should be considered. First is the power of AlphaFold2 to predict the structure of proteins that undergo polymerization. Since scospondin is known to polymerize into the RF, it is possible that this feature is confounding the predicted monomeric structure. In particular, vWF domains are known to be involved in the polymerization process of other proteins (Sepulveda et al., 2020). Therefore, it is possible that the tight grouping of the three vWF-C8-TIL modules is actually representative of intermolecular interactions between separate scospondin monomers. Second is the high degree of post-translational modification (PTM), including glycosylation, disulfide bond formation, and cleavage of scospondin. These PTMs are not reflected in the full-length primary amino acid sequence that was used for AlphaFold2 prediction, and could therefore cause errors in the structural prediction. However, despite these limitations, the present analysis provides an important first look at scospondin structure and has proven powerful for understanding the phenotypic consequences of the stl300 mutation.