GUI

Step 1: Load the host tree file, parasite/symbiont file, and tip mapping file.

Please refer to the main Documentation page for information on file formats.

Step 2: View the tanglegram (optional)

Selecting the "View Tanglegram" option pops up a window with the tanglegram comprising the two trees and the associations of their tips. Note that there is an option at the bottom of the window to change the font size of the taxa names in order to make the figure more readable. Note also that all eMPRess windows that display graphics have controls at the bottom left for panning, zooming, saving the image to a file and more. More information about these buttons can be found here. Please note that the last two buttons (sliders and save) will pop-up dialog boxes. These dialog boxes may be hidden behind other windows that you've opened using eMPRess, so you may have to move existing windows to see these.

Step 3: View event cost regions (optional)

The event cost regions partition the space of event costs. Note that event costs are "unit-less"; the values don't have any intrinsic meaning other than their relative magnitude. In other words, DTL costs of 2, 3, 1 will result in the same solutions as values of 200, 300, 100.

Cospeciation cost is fixed to 0. Loss cost is set to 1.0 by default and Duplication and Transfer are generally chosen relative to the unit cost of Loss.

The event costs regions correspond to event costs that result in the same solutions. For example, in the figure at right, all costs in the light blue band will give the same solutions. The legend shows <8, 0, 12, 5>, Count = 20 for that region meaning that costs in that region will result in MPRS with 8 cospeciations, 0 duplications, 12 transfers, and 5 losses and there are 20 distinct MPRs with those event counts.

Clicking on a point in that plot will import those costs into eMPRess. Alternatively, event costs can be entered by hand.

Step 4: Compute reconciliations

The reconciliation algorithm is activated and the number of MPRs and the number of events of each type are displayed. In this example, there are 20 MPRs.

Step 5: View solution space

The pull-down menu gives options for "Entire Space" or "Clusters". Start with "Entire Space." The histogram that is displayed gives an overview of the space of MPRs. The x-axis is the pairwise distance between pairs of MPRs and the y-axis is the number of pairs of MPRs at this distance from one another. Note that the scale of the y-axis is shown in the upper-left corner. In this case, it's 1e1 which 10 in scientific notation. Thus, 4.0 on the y-axis corresponds to 4.0 x 10 = 40.

The distance between two MPRs is the number of events that are unique to one MPR or the other. If two MPRs are identical (that is, an MPR is compared to itself), the distance is 0. In this case, there are 20 MPRs so there are 20 pairs (MPR and itself) whose distance is 0. That's shown in the leftmost bar in the histogram. The tallest bar in this example at x=10, y = 40 indicates that there are 40 pairs of MPRs that differ in exactly 10 events. Note that the rightmost bar in this example shows that there are more than 15 pairs of MPRs that differ in 20 events.

The fact that this distribution is bimodal, suggests that there may be two or more clusters of MPRs. By selecting the Clusters option in the View Solution Space pull-down menu, we can select the number clusters that we'd like to use to partition the space of MPRs.

Step 5 Continued: Clustering!

A pop-up menu allows us to enter the desired number of clusters. In this example, we chose 3 clusters. The top row in the histogram is the same as the "Entire Space" histogram above; the data are all in one cluster. The second row shows the pairwise distances for two clusters. The third row shows the pairwise distances for three clusters.

In this example, note that before clustering, there were some pairs of MPRs at distance 20. After clustering into two clusters (row 2), the first cluster has 15 MPRs (recall that, in this plot, the scale on the y-axis is 1e1 or 10) and the second cluster has 4 MPRs (no scale indicated in that plot, so 4 means 4). In the first cluster, the most distant pair of MPRs are 10 apart whereas in the second cluster they are 8 apart. That's a substantial reduction from 20 before clustering.

Clustering into three clusters resulted in two clusters where the maximum distance was 4 and one that remains at 10.

Step 6: View reconciliations

Next, we can view either one representative reconciliation from the entire space (in this case a space of 20 MPRs) or we can choose to view one MPR from each cluster. Since we chose three clusters in the previous step, asking for one MPR from each cluster will result in displaying three MPRs.

At right, one MPR is displayed. Remember that the zoom controller at the bottom right allows you to zoom in on any part of this graphic. In addition, there are buttons to turn on or off display of internal node names and support values (aka "event frequencies").

Support values: The support value (or "event frequency") adjacent to each event is a number between 0 and 100 that indicates the percentage of MPRs - for the given set of event costs - that contain this event. These values are computed exactly and not by sampling methods. Note that there is an option at the bottom of the window to show/hide these event frequencies.

Time-consistency: The legend indicates that this solution has strong time-consistency. This means that the there exists a timing of the host and parasite trees that is consistent with respect to time. In some cases, the reconciliation may only be weak time-consistent meaning that it possible to find a consistent timing, but it would require introducing additional loss events. In those cases, one or more transfers will be drawn diagonally rather than vertically. In some cases, there many not even exist a weak time-consistent reconciliation and the algorithm will not render a reconciliation at all. More information about the subtleties of time-consistency can be found in the Supplementary materials of the eMPRess paper.

Finally, note that there is an option to show/hide the display legend and also to change the font size.

Step 7: View p-value histogram

eMPRess performs a randomization process to compare the maximum parsimony cost of the original dataset to that of identical trees but with random tip associations. In this example, the cost of an MPR using the event costs selected above, indicated by the red line, is slightly under 36. 100 random trials are generated, each one using the input trees but with random tip associations. Each sample is solved using the same reconciliation algorithm. In this case all 100 samples had greater cost than the original input data. This results in a p-value of 1/101 = 0.009901. In general, the numerator is the total number of samples whose score was as good or better than that of the original dataset and the denominator is the number of random samples plus 1. In this case, we can reject the null-hypothesis that the host and parasite phylogenies are concordant due to chance at the .01 level.

Page updated

Report abuse