mRNA

Wayne's RNA World

For those who have wondered "What is life?"

Well, this is not necessarily what you were asking but at some level it is an answer. Maybe we really

need to ask "How should we live?"

General objectives: To understand the physics of biopolymers (thermodynamics, folding, and design) in general and specifically with a focus on RNA. Further, to develop useful tools and techniques in NMR research on RNA structure determination.

  • vsfold4: RNA secondary structure prediction server

  • vsfold5: RNA pseudoknot prediction server

    • Vsfold is my own secondary structure prediction program based upon thermodynamics and using a physical model I discovered during my research. The concepts of this work were originally published as part of a filtering program of the suboptimal structures predicted using mfold in the GCG package (see PubMed 11735286, PubMed 11735287, Plos One and JCSB). All such programs have their limitations and your mileage (kilometers) may vary. All the same, I care about the physics and vsfold4/5 often does better at the standard benchmark RNA structures.

  • Some useful links and information related to the RNA World

Calculation of the 3D structure of the pre-mRNA exon/intron junction region: intron 3 of human cytochrome P450 2D6

Synopsis: I was interested in finding out if there might be some kind of structure that the splicing machinery uses to select targets for the exon/intron splicing junction. Here I tried intron 3 of human cytochrome P450 2D6 because it was relatively short. I use my filtering software on mfold suboptimal secondary structure predictions to obtain the best secondary structure of the folded RNA exon/intron junction. I then convert this secondary into a preliminary 3D structure using various tools I developed and some strategies. I then use molecular dynamics simulations to find a possible 3D structure based on the secondary structure prediction.

Figure 1: 3D structure of intron 3 based on molecular dynamics simulations and secondary structure fitting (Figure 2).

The structure includes part of the 3' end of exon 3 and the 5' end of exon 4 (see secondary structure: Figure 2). The 5' end of the intron is located at the top left part of the figure where the first Guanine of the intronic sequence can be seen (Figure 1: white). The branch point (bp) Adenine is shown in brown and located near the top right side of Figure 1. The 3' end is located near the bottom of the figure with the purple Guanine. The white lines mark the distances betwen the different segments of the intron. The 5' and bp are separated by about 40 angstroms and the 3' end is separated from both the 5' and bp regions by about 70 angstroms. However, after recruitment of the spliceosome and recognition of the 5' and bp regions, this structure is quite likely to fold and therefore link these diverse regions. See also the secondary structure (Figure 2).

Figure 2: secondary structure of intron 3 (blue region) of pre-mRNA for human Cytochrome P450 IID6. The intron is colored blue in the secondary structure. The light red segments corresponds to the exonic subsequences. The 5' end of intron 3 is labeled "GU", the branch point is colored purple with the adenine labeled and colored red, and the 3 ' end is labeled "AG".

The secondary structure that this smaller subsequence (Figs 1 and 2) is based on was determined from a much longer sequence window (in this case 500 nt) that included all of exon 3, intron 3 and exon 4. The windows were scanned using a variable windowing length (ranging from 500 to 2000 nt) with the intention to locate stable secondary structures that did not change with the size of the windowing or the boundaries of the sequence used. The secondary structures were selected using a new method of entropy calculation in suboptimal secondary structure calculation strategies (cross-linking entropy) that we have recently published [1]. (Eventually I hope to produce a page describing this calculation technique.)

It is clear that there is mixing between the exonic regions and the intronic regions. This should remind one of genefind's heuristic that tends to look at the first 30 and last 30 nt in the sequence to estimate which sequences form introns.

We also note that another Adenine (brown) can be found halfway down on the right hand side of the above figure (same helix as the first: the position is not shown in Figure 2 because there is overlap between these two alternative regions). A green line joins this alternative branch point with the 5' splice site G and the separation distance is about 60 angstroms. Neither of these potential bp positions exactly fits the standard consensus sequence (YNYYRAY). The consensus sequence for the alternative branch point is perhaps less in agreement than the site neighboring the 5' end, but that is a little hard to say strictly based on the sequence data alone.

We currently assign the branch point to the top most segment based on the assumption that U1 and U2 must interact with each other. Since recognition of the 5' splice site and coupling to the branch point in the intron is perhaps the most critical step in the continued recruitment of the spliceosome, recognition can be greatly enhanced for the SR proteins (which are clearly electrophilic) at extremities of the RNA sequence where single strand RNA can be found. This greatly reduces the regions where the spliceosome must search! Moreover, if the splicing factors are way too far apart (a likely case for an arbitrary location of the critical segments in a very long intron), then the ability of U1 and U2 to link is greatly reduced.

What is clear is that the splice sites appear to be located at the extremities of the structure and when the 3D structure is taken into account, they may even turn out to be in reasonable proximity. What the figure suggest therefore, is that the spliceosome looks for the extremities first, then it searches for particular sequences that satisfy the additional criteria. The fact that the spliceosome is so flexible to variations from the consensus sequence already suggests strongly that we should consider much simpler initial conditions in the recognition process.

Part of the complexity of the spliceosome may be covering alternative positions of this branch point relative to the 5' site, but from this figure we cannot say for sure. Certainly, we only assert that this is one of the allowed combinations that leads to splicing. The overall time and direction dependent folding of the pre-mRNA is also critical to the particular structures that result since the spliceosome is a dynamic system whose "subunits" are in the proximity of the nasent pre-mRNA.

As of the time I am writing this paragraph (Oct 7th 2005), I have not tried to redo any of this stuff using vsfold4 or my new pseudoknot version (vsfold5). I expect that I will actually end up with much different results because the my filtering software depends strongly on what suboptimal structures are available. When good choices are available, it does do better. So I wouldn't take the particular figure that was generated all that seriously anymore, but it does show what I am basically doing on these problems.

[The sequence information for Cytochrome P450 IID6 was based on a sequence from a Human DNA (clone lambda2D-18/2)].

last update: Dec 29th, 2010. (just get it to fit in here) return to Wayne's home page