My name is Michael Charleston.  I'm an academic at the University of Sydney, where I do research and teach.  If you're remotely interested in such things, my Uni web page is here.  I occasionally tweet as @mikecharleston.

Here is my home for cophylogenetic analysis, also known as cophylogeny or cophylogenetics.  It's also the new home for TreeMap because it seems like the thing needs a better home.

Cophylogeny Outline
Suppose you have information about the evolutionary history of two taxonomic groups, e.g., species or genera, or possibly languages or geographical regions.  These are assumed to have some kind of current relationship with each other, such as where one is a group of mammal species and the other is a group of parasites of those mammals.  Given these two histories it is natural to want to know, what is the relationship between them?  Have they co-evolved for millions of years, or are the parasites switching and establishing any-how among different host species?

Your information so far has three main components: the two evolutionary trees (the most common and most useful way of representing the evolutionary history of a group of organisms), and a set of associations that state which taxonomic unit in one tree (ok "taxon" from here on; plural is "taxa") is associated with which taxon or taxa in the other tree.  The coevolutionary questions are therefore to do with what were the associations between the trees in the past, e.g., which host species did the parasite species infect?

Cophylogenetic analysis is known to be ridiculously hard, in the sense that it's computationally complex to get optimal solutions to the problem [Ovadia et al., 2011].  "The problem" here is of finding a way of associating the ancestral species in the parasite (or equivalent) evolutionary tree with places in the host (or equivalent) tree, in some way that makes the most sense.  Currently we judge what makes sense in terms of how many non-codivergence events we have to hypothesize in order to account for the differences between the two trees.

 we might have a perfect match  or it might look completely unrelated.

(The input files for these two scenarios are here.)

So there are two basic goals in most analyses:
  • to determine whether there's a significant match between host and parasite trees;
  • to determine what is the best explanation for the differences between the two.
Naively I first thought that users of TreeMap2 (which was in C++ and is still available here) would be mainly considering cases where the trees were matching nicely, as on the left, or with the classic Gopher/Louse study.  (This is sadly not the case and I often have e-mail from users complaining that TreeMap2 couldn't handle their data, which when I persuade them to send to me in some kind of anonymous form, looks more like a birds nest than anything I could analyse.)  This makes a huge difference to the performance of TreeMap2, which was optimised to cope with situations more like the one on the left above than the one on the right.

Number of visitors since 2011.03.31: vBulletin statistic