DPAM: A domain parser for AlphaFold models
The recent breakthroughs in structure prediction, where methods such as AlphaFold demonstrated near-atomic accuracy, herald a paradigm shift in structural biology. The 200 million high-accuracy models released in the AlphaFold Database are expected to guide protein science in the coming decades. Partitioning these AlphaFold models into domains and assigning them to an evolutionary hierarchy provide an efficient way to gain functional insights into proteins. However, classifying such a large number of predicted structures challenges the infrastructure of current structure classifications, including our Evolutionary Classification of protein Domains (ECOD). We developed Domain Parser for AlphaFold Models (DPAM) that can automatically recognize globular domains from these models based on inter-residue distances in 3D structures, predicted aligned errors, and ECOD domains found by sequence (HHsuite) and structural (Dali) similarity searches. DPAM significantly outperformed structure-based domain parsers and homology-based domain assignment using ECOD domains found by HHsuite or Dali. Application of DPAM to the massive AlphaFold models will enable efficient classification of domains, providing evolutionary contexts and facilitating functional studies.
The DPAM domain parser is available at: https://github.com/CongLabCode/DPAM
AFTM - human transmembrane protein database
Transmembrane proteins (TMPs), with diverse cellular functions, are difficult targets for structural determination. Predictions of TMPs and the locations of transmembrane segments using computational methods could be unreliable due to the potential for false positives and false negatives and show inconsistencies across different programs. Recent advances in protein structure prediction methods have made it possible to identify TMPs and their membrane-spanning regions using high-quality structural models. We developed the AlphaFold Transmembrane proteins (AFTM) database of candidate human TMPs by identifying transmembrane regions in AlphaFold structural models of human proteins and their domains.
The AFTM database is available at: https://conglab.swmed.edu/AFTM
Cancer interactome
We applied AlphaFold to investigate the cancer protein–protein interactome. 1,798 protein-protein interactions (PPIs) were predicted for cancer driver proteins involved in diverse cellular processes such as transcription regulation, signal transduction, DNA repair, and cell cycle. Our predictions offer novel structural insight into many cancer-related processes such as the MAP kinase cascade and Fanconi anemia pathway. We further investigated the cancer mutation landscape by mapping somatic missense mutations (SMMs) in cancer to the predicted PPI interfaces and performing enrichment and depletion analyses. Interfaces enriched or depleted with SMMs exhibit different preferences for functional categories. Interfaces enriched in mutations tend to function in pathways that are deregulated in cancers and they may help explain the molecular mechanisms of cancers in patients; interfaces lacking mutations appear to be essential for the survival of cancer cells and thus may be future targets for PPI modulating drugs.
The article about cancer interactome is here. The data of cancer interactome is available at: https://modelarchive.org/doi/10.5452/ma-t3vr3
Mitochondrial protein-protein interactions
We applied RoseTTAFold and AlphaFold, two of the latest deep-learning methods for structure predictions, to analyze coevolution of human proteins residing in mitochondria, an organelle of vital importance in many cellular processes including energy production, metabolism, cell death and antiviral response. For high-scoring pairs without experimental complex structures, our coevolution analyses and structural models shed light on the details of their interfaces, including CHCHD4–AIFM1, MTERF3–TRUB2, FMC1–ATPAF2 and ECSIT–NDUFAF1. We also identified novel PPIs (PYURF–NDUFAF5, LYRM1–MTRF1L and COA8–COX10) for several proteins without experimentally characterized interaction partners, leading to predictions of their molecular functions and the biological processes they are involved in.
The data of mitochondrial proteins and the predicted interactions are available at: http://conglab.swmed.edu/mitochondria
Regulatory regions in human protein kinases
Protein kinases are a diverse group of enzymes that play crucial roles in various cellular processes such as signal transduction, cell proliferation, differentiation, and apoptosis. Protein kinases catalyze the transfer of phosphate groups from ATP to target proteins, leading to changes in their activity, localization, and interactions. Many kinases are featured by their intramolecular regulatory regions including globular domains and nonglobular regions. Studying these interactions between kinase domains and their regulatory regions can be challenging due to the flexibility of nonglobular regions, the long insertions separating interacting modules, and the transient nature of some interactions. High-quality structural models generated by AlphaFold offer a unique opportunity to study intramolecular interactions. We systematically explored intramolecular interactions between human protein kinase domains (KDs) and potential regulatory regions, including globular domains, N- and C-terminal tails, long insertions, and distal nonglobular regions. Our analysis identified intramolecular interactions between human KDs and 35 different types of globular domains, exhibiting a variety of interaction modes that could contribute to orthosteric or allosteric regulation of kinase activity. We also identified prevalent interactions between human KDs and their flanking regions (N- and C-terminal tails). These interactions exhibit group-specific characteristics and can vary within each specific kinase group. Although long-range interactions between KDs and nonglobular regions are relatively rare, structural details of these interactions offer new insights into the regulation mechanisms of several kinases, such as HASPIN, MAPK7, MAPK15, and SIK1B.
The data of human kinases and their intramolecular interactions are available at: https://conglab.swmed.edu/kinreg
Impact of Asp/Glu-ADP-ribosylation on protein-protein interaction and protein function
PARylation plays critical role in regulating multiple cellular processes such as DNA damage response and repair, transcription, RNA processing, and stress response. More than 300 human proteins have been found to be modified by PARylation on acidic residues, that is, Asp (D) and Glu (E). In collaborative with the Yonghao Yu lab in Columbia University, we used AlphaFold to predict protein-protein interactions (PPIs) and their interfaces for these proteins . AlphaFold predicted 260 confident PPIs involving PARylated proteins, and about one quarter of these PPIs have D/E-PARylation sites in their predicted PPI interfaces. AlphaFold predictions offer novel insights into the mechanisms of PARylation regulations by providing structural details of the PPI interfaces. D/E-PARylation sites have a preference to occur in coil regions and disordered regions, and PPI interfaces containing D/E-PARylation sites tend to occur between short linear sequence motifs in disordered regions and globular domains. The hub protein PCNA is predicted to interact with more than 20 proteins via the common PIP box motif and the structurally variable flanking regions. D/E-PARylation sites were found in the interfaces of key components of the RNA transcription and export complex, the SF3a spliceosome complex, and H/ACA and C/D small nucleolar ribonucleoprotein complexes, suggesting that systematic PARylation have a profound effect in regulating multiple RNA-related processes such as RNA nuclear export, splicing, and modification. Finally, PARylation of SUMO2 could modulate its interaction with CHAF1A, thereby representing a potential mechanism for the cross-talk between PARylation and SUMOylation in regulation of chromatin remodeling.
The data of D/E-PARylated proteins and their interactions are available at: https://conglab.swmed.edu/DE_PARylation