Clustering protein language model representations to sample alternative conformations with AlphaFold2
Investigating structural variability is essential for deciphering protein biological functions. While AlphaFold2 (AF) excels in predicting highly accurate static structures, it often fails to capture the full spectrum of functional states. Although recent approaches have made progress in using AF to generate heterogeneous ensembles of diverse protein structures, these methods often lack interpretability and have limited capacity to offer deeper insights into the evolutionary patterns correlating with the underlying structural changes. In this study, we show that co-evolutionary signals within Multiple Sequence Alignments drive AF’s predictions, and that clustering homologous sequences is an effective strategy to isolate couplings that correspond to distinct conformations. By leveraging Protein Language Model representations, we significantly expand the sequence pool reflecting diverse functional states, while reducing false positives compared to existing methods. This enables reliable Direct Coupling Analysis for the detection of residues that are strongly correlated within a specific cluster, allowing to identify mutations that stabilize the corresponding structure. To assess the impact of these mutations, we use Molecular Dynamics simulations with alchemical free-energy calculations and confirm their effectiveness in stabilizing the alternative conformations. Furthermore, we demonstrate the broader applicability of this approach by extending it beyond fold-switching to encompass general conformational changes.