Our group focuses on the development and application of computational and experimental research methods geared toward human well-being. We combine expertise as biochemists with software development skills, and we are eager to use this training in overcoming obstacles and bottlenecks in our understanding of how biomolecules function through interaction with their microenvironment. We envision contributing indispensable high-throughput tools for analyzing biochemical/biophysical data and making technological breakthroughs towards a better understanding of protein 3D structures in terms of their roles and behaviors in biological context.
One of the most important and time-consuming steps in NMR-based biomolecular research is to correlate resonance signals appearing from multi-dimensional spectra to atoms of a protein. This procedure is called the “chemical shift assignment.” A protein studied by NMR spectroscopy typically consists of fifty to three hundred amino acids, with each amino acid containing between six to fifty atoms. Thousands of signals from different multi-dimensional spectra need to be detected and labeled in order to complete chemical shift assignments and enable NMR-based protein structure studies at the molecular and atomic level. We develop automated algorithms for assigning solution and solid-state NMR data. The I-PINE (Integrative Probabilistic Interaction Network of Evidence) algorithm that performs automated probabilistic chemical shift assignment by using the PACSY and the Bayesian inference. The web version of the I-PINE program is the most popular automated assignment program in the field of protein NMR having served 3,012 submissions in 2018 alone (Lee et al. JBNMR 2019). Currently, we develop automated algorithms for assigning aromatic atoms, C13-detection and solid-state NMR data, which are called the ARSIGN (Automated aRomatic Side chain assiGNment), MOCCA-PINE and the ssPINE (Solid-State NMR version of Probabilistic Interaction Network of Evidence). They will be important game-changers in structural biology because aromatic atoms are strong structural stabilizers, C13-detection spectra are a key to intrinsicall disordered proteins (IDPs), and solid-state NMR (ssNMR) enables studies of larger proteins, membrane proteins, and fibrous proteins that are not amenable to solution methods.
Fig. 1. The I-PINE’s Bayesian framework automating backbone and aliphatic side chain assignments using solution NMR data. ARSIGN, MOCCA-PINE and ssPINE are developed to fill the gaps such as aromatic side chains, IDPs, and ssNMR data.
Fig. 2. CDHR3 plays a critical role in HRV-C endocytosis by binding VP1-4 capsid with its EC1-2 components. Matured HRV-C’s viral infection is triggered by its 2Apro’s self-cleavage. I determined solution 3D structure and dynamics of HRV-C2 2Apro by NMR spectroscopy and my collaborator, Palmenberg group, determined the HRV-C15 capsid. Currently, I analyze NMR spectra from EC1 of CDHR3 to characterize the structure and function. EC2 and EC1-2 complex are also planned to be studied by NMR spectroscopy.
HRV-C serotypes discovered recently constitute more than 50 new isolates. The HRV-C serotypes have been associated with various symptoms such as upper respiratory tract infections (RTIs), asthma exacerbation, pneumonia, etc, and they are of particular interest because of the propensity for children’s asthma. HRVs vary significantly in their virulence due to differing specificities for the 2A protease (2Apro) substrate. The enterovirus 2Apro are chymotrypsin-like enzymes that play an important role in virus replication by cleaving a spectrum of cellular proteins. Although these enzymes are attractive drug targets, there are no vaccines or commercial drugs currently available due to protease diversity. Low sequence homology between serotypes has been identified from sequencing, and only three structures have been determined. My structure of HRV-C2 is the only structure determined in the solution-state. A recent study by Palmenberg’s group revealed that the HRV-C capsid is structurally distinct from HRV-A and HRV-B and does not bind ICAM-1 or LDLR but instead binds the human cadherin-related family member 3 (CDHR3) which comprises of an extracellular repeat domain 1-6 (EC1-6). Our group investigates the structural and functional properties of the HRV-C components using NMR spectroscopy and apply the findings to future translational studies. In the long term, our group will try to adapt synthetic protein design technologies (e.g. RosettaDesign) to make a new drug entry method using VP1-4 and CDHR3’s binding specificities and 2Apro’s viral trigger mechanism.
Allosteric control is one of the common regulation methods of a protein. A binding site of the protein is different from an active site and often an effector's binding results in conformational and dynamic property changes that either enhance the activity and decrease the activity of the protein. Long-range allostery is known to be a key regulatory mechanism in cell signaling. However, the identification and prediction of allosteric residues are very time-consuming and often remain elusive to traditional structure determination methods because many mutated constructs and their three-dimensional structures should be made to determine residue-specific conformational changes and binding affinities. My collaborator, Giuseppe Melacini and his group members proposed a tool set of NMR-based methods called CHESCA to overcome this limitation. By analyzing multi-dimensional NMR chemical shift titration data, he could detect long-range allosteric networks of the regulatory subunit of Protein Kinase A (PKA R) and Exchange Protein 1 Activated by cAMP (EPAC1) successfully. This concept has been extended to machine-learning approaches like hierarchical agglomerative single-linkage and complete-linkage clusterings (CHESCA-SL/CHESCA-CL), singular vector decomposition (SVD) analysis and separate 1H and 15N chemical shifts methods (CHESCA-I). Despite its robustness and reliability, these methods are not easily implemented for other systems and laboratories because they require not only chemical shift assignments but also multiple computational and statistical analyses associated with computer programming. In this situation, we develop AI-based high-throughput technologies and automation to ease these complex CHESCA methods which will be a milestone in protein allosteric regulation studies.
Fig. 3. Automated and visualized allosteric network mapping by CHESCA-SPARKY program. International collaboration with Melacini group from McMaster University is ongoing using the new Linux server and virtual network computing (VNC) technologies.
Fig. 4. The Integrative NMR platform for biomolecular research.
Three major bottlenecks in characterizing structures and functions of biomolecules by NMR are: a. biological limitations such as low yield and stability of the sample, b. instrumental limitations such as cost, time, spectral resolution and pulse sequences, and c. lack of expertise for handling and interpreting data. Among three bottlenecks, the biological limitations have been tackled by many groups testing a variety of plasmids and developing cell-based and cell-free technologies because the production of a reliable protein sample is fundamental and ubiquitous in the biochemical study. However, technologies towards two other aspects have been driven by only a few groups. To address these issues, I have developed a computational platform called Integrative NMR, which provides a seamless and interactive environment for biomolecular research. This system makes biomolecular NMR spectroscopy much more accessible by integrating software tools so that they interact efficiently in ways that support automation, validation, and visualization (Fig. 4). These include non-uniform sampling (NUS), reduced dimensionality (RD), automated signal detection and chemical shift assignment and automated structure determination. My group will continue working on developing more advanced IHT-NMR (Integrative High-Throughput NMR) that can finish structural and functional analyses of proteins routinely within a week by connecting the Integrative NMR and fast data collecting technologies that reduce actual cost. IHT-NMR will provide an interactive bio-NMR platform for scientists to carry out NMR experiments, data analyses, and structure calculations. As the first step, the "POKY" suite has been developed. We develop and use the POKY suite as the advanced research and technology development platform to accomplish our IHT-NMR to challenge current limitations in the field of NMR-based structural biology.
Protein structure determination is based on calculating potential energies of conformations and finding a minimum global energy state. Pure theoretical calculation of all protein conformations in the vast energy landscape would take hundreds of years of computation, with the resulting structure being inaccurate since many of the necessary constants needed for computation are only approximations. It is common to supplement structural calculations with additional terms using experimental data. Increasingly, researchers are using data from different sources because these methods are often insufficient to determine structures when used alone. I designed AUDANA (Automated Databased-Assisted NOE Assignment) as an algorithm for determining 3D protein structure based on NMR data that automates the assignment of 2D/3D-NOE (Nuclear Overhauser Effect) spectra, generates distance constraints, and conducts iterative high temperature molecular dynamics and simulated annealing. Distance constraints generated automatically from ambiguously assigned NOE peaks are validated during the structure calculation against information from an enlarged version of the PACSY that incorporates information on protein structures deposited in the PDB. Our group will develop automated analysis of hybrid data in integrated manner to simplify this hybrid approach which is complex based on this experience. Our recent NMR/SAXS hybrid structure of mitochondrial-targeted GTPase-activating protein (GAP) VopE utilizing this automation (collaboration with Dr. Kyle Smith) has been published in Protein Science.
Fig. 5. The ISBP (Integrative Structural Biology by PONDEROSA-C/S) will automate interpretation of hybrid data from different experiments to determine difficult targets that cannot be achieved by single method.
Visit "Publications" section on top to read related publications.
Watch YouTube videos from Woonghee's channel to learn software tools that we develop. Subscribe and turn notification on!