Integrating Physical Principles with AI for Predicting Protein Dynamics
Understanding the dynamic conformational changes of proteins is crucial for unravelling their functional roles in biological systems. Recent advancements in artificial intelligence (AI) have enabled accurate predictions of static protein structures, but predicting their dynamic movements remains a significant challenge due to limited experimental data on protein motions.
Our research introduces a novel approach that combines physical principles with AI to predict protein conformational dynamics. By leveraging protein energy landscape information, the method enhances the capabilities of AlphaFold2 (AF2), enabling it to predict structural changes and transition pathways of proteins accurately. The key innovation lies in identifying and utilizing "energetic frustration"—regions where proteins experience unresolved energetic conflicts during folding, which are essential for functional movements.
This interdisciplinary approach not only addresses the challenging problem of predicting protein dynamics using AI but also exemplifies the successful integration of biophysical knowledge with artificial intelligence in molecular biology research. The method holds significant potential for practical applications, particularly in drug design, enzyme engineering, and the analysis of disease mechanisms. By bridging physics and AI, this research opens new avenues for precise biomolecular design and a deeper understanding of complex life processes.
Xingyue Guan, Qian-Yuan Tang, Weitong Ren, Mingchen Chen, Wei Wang, Peter G. Wolynes, Wenfei Li, Predicting protein conformational motions using energetic frustration analysis and AlphaFold2. Proceedings of the National Academy of Sciences. 121 (35) e2410662121 (2024). https://doi.org/10.1073/pnas.2410662121
Press release (English): https://phys.org/news/2024-08-artificial-intelligence-frustration-protein.html
The Statistical Trends of Protein Evolution: A Lesson from AlphaFold Database
The recent development of artificial intelligence provides us with new and powerful tools for studying the mysterious relationship between organism evolution and protein evolution. In this work, based on the AlphaFold Protein Structure Database (AlphaFold DB), we perform comparative analyses of the proteins of different organisms. The statistics of AlphaFold-predicted structures show that, for organisms with higher complexity, their constituent proteins will have larger radii of gyration, higher coil fractions, and slower vibrations, statistically. By conducting normal mode analysis and scaling analyses, we demonstrate that higher organismal complexity correlates with lower fractal dimensions in both the structure and dynamics of the constituent proteins, suggesting that higher functional specialization is associated with higher organismal complexity. We also uncover the topology and sequence bases of these correlations. As the organismal complexity increases, the residue contact networks of the constituent proteins will be more assortative, and these proteins will have a higher degree of hydrophilic-hydrophobic segregation in the sequences. Furthermore, by comparing the statistical structural proximity across the proteomes with the phylogenetic tree of homologous proteins, we show that, statistical structural proximity across the proteomes may indirectly reflect the phylogenetic proximity, indicating a statistical trend of protein evolution in parallel with organism evolution. This study provides new insights into how the diversity in the functionality of proteins increases and how the dimensionality of the manifold of protein dynamics reduces during evolution, contributing to the understanding of the origin and evolution of lives.
Qian-Yuan Tang, Weitong Ren, Jun Wang, Kunihiko Kaneko, The Statistical Trends of Protein Evolution: A Lesson from AlphaFold Database. Molecular Biology and Evolution, 39(10), msac197 (2022). https://doi.org/10.1093/molbev/msac197
Data available at: https://github.com/qianyuantang/stat-trend-protein-evo
Dynamics-evolution correspondence in protein structures
Search for the relationship between the dynamics and evolution of biological systems is a crucial topic in bioscience. The former concerns processes at short timescales, such as biochemical reactions or molecular dynamics for adaptation to environmental changes, whereas the latter involves the accumulation of random mutations and natural selection over generations. Still, historically, the similarities between the two have been suggested, and their origin remained to be unveiled. Now, the dynamics and evolution of proteins are essential and lie as the basis of biological systems. In this work, with the database of hundreds of thousands of proteins from hundreds of protein families, we analyze the deformations of proteins subjected to both thermal noises and mutations. The data analysis quantitatively demonstrates the correlations between the flexibility (noise-induced dynamics) and evolvability (mutation-induced variations in structures), indicating the "dynamics-evolution correspondence" in proteins, which works as a universal link in the genotype-phenotype mapping. We then provided a theoretical explanation of such correspondence. We also demonstrate that, although both protein dynamics and evolution involve vast degrees of freedom (with the thermal fluctuations and mutation in amino acid residues), they are restricted to the common low-dimensional manifold. These results also provide a quantitative framework for the function-driven protein design.
Qian-Yuan Tang, Kunihiko Kaneko, Physical Review Letters, 127, 098103 (2021). https://doi.org/10.1103/PhysRevLett.127.098103
Press release (English): https://www.u-tokyo.ac.jp/focus/en/press/z0508_00194.html
プレスリリース(日本語)https://research-er.jp/articles/view/102460
Unveiling the Power-Law Structure in Deep Learning: Hessian Spectrums and Stochastic Gradients
Recent research has revealed that the Hessian spectrum in deep learning exhibits a two-component structure, with a few large eigenvalues and many near-zero eigenvalues. We demonstrate that these Hessian spectrums follow a power-law distribution, leading to a low-dimensional, robust learning space and low-complexity solutions. This framework also connects deep learning with protein science, highlighting the broad implications of this discovery. Stochastic gradients, vital to optimizing and generalizing deep neural networks, have sparked debate over their heavy-tail properties. Our formal statistical tests reveal that dimension-wise gradients often exhibit power-law heavy tails, while iteration-wise gradients and noise from minibatch training typically do not. Moreover, we find that the covariance spectra of stochastic gradients follow a power-law structure, challenging existing beliefs and offering new insights into deep learning's mathematical foundations. These findings underscore the importance of power-law distributions in shaping robust and efficient deep-learning training processes.
Zeke Xie*, Qian-Yuan Tang*, Mingming Sun, Ping Li, On the Overlooked Structure of Stochastic Gradients, Proceedings of the 37th International Conference on Neural Information Processing Systems (NeurIPS 2023), pp. 66257-66276. (2023) https://dl.acm.org/doi/10.5555/3666122.3669014.
Zeke Xie, Qian-Yuan Tang, Yunfeng Cai, Mingming Sun, Ping Li. On the Power-Law Spectrum in Deep Learning. (2022) https://arxiv.org/abs/2201.13011
The critical fluctuations of proteins
Proteins in cellular environments are highly susceptible. Local perturbations to any residue can be sensed by other spatially distal residues in the protein molecule, showing long-range correlations in the native dynamics of proteins. The long-range correlations of proteins contribute to many biological processes such as allostery, catalysis, and transportation. Revealing the structural and spectral origin of such long-range correlations is of great significance in understanding the design principle of biologically functional proteins.
A general framework is established to analyze the dynamics of protein systems under both external and internal perturbations. The compatibility of sensitivity and robustness is analyzed by the optimization of two entropies, which leads to the power-law vibration spectrum of proteins. These power-law behaviors are confirmed extensively by protein data, as a hallmark of criticality. This framework can also establish a general link of the criticality with robustness-plasticity compatibility, both of which are ubiquitous features in biological systems.
Qian-Yuan Tang, Testsuhiro S. Hatakeyama, Kunihiko Kaneko, Functional Sensitivity and Mutational Robustness of Proteins, Physical Review Research, 2(3), 033452 (2020).
中文版介绍 https://mp.weixin.qq.com/s/qH_6-4smLQdHkh8YpfEvMw
Qian-Yuan Tang, Kunihiko Kaneko. Long-range correlation in protein dynamics: Confirmation by structural data and normal mode analysis. PLoS Computational Biology, 16(2), e1007670 (2020).
Qian-Yuan Tang, Yang-Yang Zhang, Jun Wang, Wei Wang, and Dante R. Chialvo, Critical fluctuations in the native state of proteins, Physical Review Letters, 118, 088102 (2017).
Media Report (in English): http://physicsbuzz.physicscentral.com/2017/02/proteins-at-edge.html