RNA Secondary Structure Modeling

Background

An RNA chain folds back to form higher dimensional structures such as patterns of base pairings forming helices and single stranded loops. The two-dimensional formation is called the secondary structure of an RNA which plays an important role in determining the function of an RNA sequence. A major goal in RNA biology is to consistently and efficiently predict RNA secondary structures. One of the most common tools for computational prediction relies on the nearest neighbor thermodynamic model (NNTM). Although very useful, the NNMT approach is prone to accuracy errors in its predictions. In recent years, methods based on machine learning, specially deep learning, for RNA structure prediction have been proposed. An important challenge is to understand the way auxiliary data generated with deep learning methods constrains the minimum free energy structure and to assess the extent of the accuracy improvements.

Research Projects

In [1], we studied the problem of determining which nucleotides of an RNA sequence are paired or unpaired in the secondary structure of an RNA, which we called RNA state inference. Successful state inference of RNA sequences can be used to generate auxiliary information for data-directed RNA secondary structure prediction. In [1], we presented a deep learning state inference tool, trained and tested on 16S ribosomal RNA. Converting these state predictions into synthetic SHAPE data with which to direct NNTM results in significant improvements in secondary structure prediction accuracy on a test set of 16S rRNA.

In [2], we studied the conditioning and robustness of Boltzmann sampling of RNA secondary structures under thermodynamic parameter perturbations. Specifically, we presented a mathematically rigorous model for conditioning inspired by numerical analysis, and also a biologically inspired definition for robustness under thermodynamic perturbation. We demonstrated the strong correlation between conditioning and robustness and use its tight relationship to define quantitative thresholds for well versus ill conditioning.

References

  1. Improving RNA secondary structure prediction via state inference with deep recurrent neural networks. Devin Wilmott, David Murrugarra, and Qiang Ye. Computational and Mathematical Biophysics, 8(1), 36-50, 2020. https://doi.org/10.1515/cmb-2020-0002.

  2. Conditioning and robustness of Boltzmann sampling of RNA secondary structures under thermodynamic parameter perturbations. Emily Rogers, David Murrugarra, and Christine Heitsch. Biophysical Journal, 113, 2, 321-329, 2017. Full text.