To illustrate the sequence conservation and variation in the 16S rRNA tail and adjacent regions, we made three consensus alignments (“Weblogos”) (shown below). In the first, Fig A, we consider our 12,426 re-annotations. We take the last 24 bases of helix 45, plus the 13 base tail, plus the next 10 bases of the genome, and align by helix 45. The alignment shows striking conservation through the whole 13 base tail, with a small amount of sequence heterogeneity on the last (13th) base, with heterogeneity increasing thereafter. In the second consensus alignment, Fig. B, we repeat the analysis for the existing 8096 annotations, again aligning by helix 45, and get essentially identical results. In the third consensus alignment, Fig. C, we align the 3’ terminal 47 bases from the 8096 existing annotations, but in this case, we align by the annotated 3’ end. In this alignment, heterogeneity is seen at every position. Although a rigorous conclusion cannot be reached without experiments, this result suggests that the existing annotations of the 3’ ends are not correct in many cases. That is, in many of these existing annotations, the true 3’ end is likely shorter than annotated.
The consensus alignments in above figure have limited resolution; minor sequence variants cannot be seen. For our 12,426 re-annotations plus the existing 8096 annotations, we examined the exact sequences of the 13 base tails. Again, it was clear that the 13 base tail is very highly conserved, with only very limited sequence variation. The major variants (> 40 occurrences) and their frequencies are shown in the following table. Of the 20,495 total sequences, over 19,600 have a tail with sequence GATCACCTCCTTT (14,088) or GATCACCTCCTTA (5,527).