We analyzed sequence conservation in three (3) different ways. Each of these individually produced a list of potential areas of interest (in the form of BED files) that we later combined to get our results.
1. Conserved Segments
We made a motif logo to identify regions of the genome that showed increased conservation.
2. Codon Conservation
We tracked conserved codons to see if protein-conserving changes were occurring.
3. Nucleotide Variation
We tracked individual nucleotide variation over many samples of each segment.
By analyzing the IntaRNA outputs, we were able to generate heatmaps corresponding to how well each nucleotide paired with every other nucleotide on every segment. Here are two examples comparing two different segments of Rotavirus (DS1-Like)
In these graphs, the positions of each segment run down the left hand side and across the bottom. Blue represents areas of low minimum free energy (MFE), indicating a favorable interaction. On the contrary, red represents poor interaction with a high MFE.
Each of these method produced BED files with the start and stop positions of the potential packaging signals labeled. This was combined to make the following graphs that show where we think potential packaging signals lie.
This is an alternate representation of the graphs for two specific segments of influenza, the high peaks showing where a packaging signal is more likely to be found.