Evolution at the species level is not necessarily tree-like. However, the evolution of an individual gene is typically assumed to follow a tree-like pattern, and so a phylogenetic network is often viewed as an amalgamation of gene trees. With this viewpoint, a natural question is as follows. Given a phylogenetic network N, how many distinct gene trees are embedded in N? If N has no reticulations, the answer is easy. But what if N has many reticulations? The question now is potentially much more challenging. In this talk, we explore this question and discuss a recent result. This is joint work with Kristina Wicke.
A tanglegram consists of two rooted binary trees with the same number of leaves together with a perfect matching between their leaf sets. In phylogenetics, tanglegrams arise naturally in the study of cospeciation and coevolution.
In this talk, we investigate a reconstruction problem for tanglegrams. Given a tanglegram of size n, that is, a tanglegram whose trees have n leaves, one obtains a multiset of induced size-(n − 1) tanglegrams by deleting a matched pair of leaves in all possible ways. We ask whether the original size-n tanglegram is uniquely determined (up to isomorphism) by this multiset.
We answer this question affirmatively when at least one of the two trees is a caterpillar. We also report on work in progress establishing a second affirmative case, namely for planar tanglegrams. We end with some open questions and directions for future research.
This is based on joint work with Ann Clifton, Éva Czabarka, Kevin Liu, Sarah Loeb, Utku Okur, and Lászlo Székely.
10:20-10:50 | ☕ Coffee break
Tree shape statistics derived from peripheral structures have been widely investigated in the study of random phylogenetic tree models, motivated in part by the need for effective inference tools to understand the influences of various evolutionary forces. In this talk, I will present recent progress on the analysis of subtree patterns, including a Strong Law of Large Numbers and a Central Limit Theorem for their joint distributions. I will also discuss applications of these results, as well as the challenges that arise when using subtree based statistics to study evolutionary biology. This talk is based on joint work with Gursharn Kaur, Ariadne Thompson, and Kwok Pui Choi.
We establish stochastic limits of random level-k phylogenetic networks which describe their asymptotic shape on a global and local scale. We discuss applications to convergence in distribution and moments of continuous functionals of these networks.
12:10-13:40 | 🍽️ Lunch
Rooted binary perfect phylogenies provide a generalization of rooted binary unlabeled trees. In a rooted binary perfect phylogeny, each leaf is assigned a positive integer value that corresponds in a biological setting to the count of the number of indistinguishable lineages associated with the leaf. For the rooted binary unlabeled trees, these integers equal 1. We enumerate rooted binary perfect phylogenies with n ≥ 1 leaves and sample size s, s ≥ n: the rooted binary unlabeled trees with n leaves in which a sample of size s ≥ n lineages is distributed across the n leaves. (1) First, we recursively enumerate rooted binary perfect phylogenies with sample size s, summing over all possible n, 1 ≤ n ≤ s. We obtain an equation for the generating function, showing that asymptotically, the number of rooted binary perfect phylogenies with sample size s grows with $0.3519 (3.2599)^s(s^(− 3/2))$. (2) Next, we recursively enumerate rooted binary perfect phylogenies with a specific number of leaves n and sample size s ≥ n. We provide a recurrence for the generating function describing, for each number of leaves n, the number of rooted binary perfect phylogenies with n leaves as the sample size s increases. We also obtain an equation satisfied by the bivariate generating function counting rooted binary perfect phylogenies with n leaves and sample size s, as well as an asymptotic normal distribution for the number of leaves in a randomly chosen perfect phylogeny with sample size s. The enumerations further characterize the rooted binary perfect phylogenies, which include the rooted binary unlabeled trees, and which can provide a set of structures useful for various biological contexts.
The k-Robinson-Foulds (k-RF) distance compares two labeled Cayley trees by counting the number of edge-based k-local splits—vertex sets within graph distance k on each side of an edge—that appear in exactly one of the trees. Using the symbolic method one can establish a combinatorial specification for these patterns and convert them into exponential generating functions that encode how often two random trees share k-local splits. Asymptotic analysis of these generating functions will yield the distribution of the k-RF distance for fixed parameter k. Tentatively, assembling these results will make the phase shift in k precise and clarify the currently unknown transition between regimes. Joint work with Michael Fuchs and Bernhard Gittenberger.
We investigate the enumeration and structure of phylogenetic networks under planar, upward planar, terminal planar, and outer planar constraints. For level-3 networks, we derive asymptotic formulas of the form Nn ∼ c ·nn−1 · γn. Our results demonstrate that planarity constraints lead to a richer hierarchy of asymptotic behaviors at level 3, where the partial collapses observed at level 2 no longer occur.
Motivated by this, we briefly discuss structural results for galled networks under these planarity conditions, highlighting the combinatorial characterizations of these classes and how this perspective helps explain their separation.
This talk is based on the joint work with Taoyang Wu (The University of East Anglia) and Guan-Ru Yu (National Sun-Yat Sen University)
Phylogenetic diversity (PD) is a well-established measure for quantifying the evolutionary diversity of a set of species. It plays a central role in conservation planning and underlies initiatives such as the EDGE of Existence program of the Zoological Society of London and the phylogenetic diversity task force of the IUCN. Traditionally, PD is defined on phylogenetic trees: given a set of species A, its PD score is the total weight of all edges on the minimal subtree connecting the root with the leaves in A. However, modern phylogenetic analyses increasingly rely on phylogenetic networks to model reticulate evolutionary events such as hybridization and horizontal gene transfer. This raises the fundamental question of how PD should be defined in the presence of multiple evolutionary paths.
In this talk, we survey several prominent generalizations of phylogenetic diversity to phylogenetic networks. We discuss their underlying modeling assumptions, how well they capture biological intuition, and the algorithmic challenges they pose. In particular, we compare these notions with respect to their computational tractability and practical applicability, highlighting trade-offs between biological expressiveness and efficient computation.
15:00-15:30 | ☕ Coffee break
In this talk, I will discuss the problem of inferring a species tree from sequence and gene tree data in the setting where mutation rates vary independently across genes. The main focus will be on the sample complexity of this problem—i.e., the question of how much data is required to achieve high probability of correct inference. I will present an impossibility result taking the form of a lower bound on the amount of data needed for accurate inference. Based on joint work with Sebastien Roch.
The development of powerful strategies to help protect the planets biodiversity will undoubtedly be informed by the understanding of the evolutionary history of plants and animals, amongst others. Usually expressed in the form of a phylogenetic tree, such a structure might not always be appropriate to capture the complex evolutionary picture of organisms, in that their past could have been influenced by non-tree like evolutionary events such as introgression which is known to effect butterfly evolution. In such cases more general structures called (phylogenetic) networks have proven useful. Originally introduced as single-rooted directed acyclic graphs these have recently been studied in multi-rooted form as, for example, arboreal networks and forest-based networks. In this talk, we first introduce and review some of the key results for arboreal networks and then present some novel results concerning them. This is joint work with Katharina Huber.
Phylogenetic trees and networks are graphs used to model evolutionary relationships, with trees representing strictly branching histories and networks allowing for events in which lineages merge, called reticulation events. While the question of data sufficiency has been studied extensively in the context of trees, it remains largely unexplored for networks. In this work we take a first step in this direction by establishing bounds on the amount of genomic data required to reconstruct binary level-1 semi-directed phylogenetic networks, which are binary networks in which reticulation events are indicated by directed edges, all other edges are undirected, and cycles are vertex-disjoint. For this class, methods have been developed recently that are statistically consistent. Roughly speaking, such methods are guaranteed to reconstruct the correct network assuming infinitely long genomic sequences. Here we consider the question whether networks from this class can be uniquely and correctly reconstructed from finite sequences. Specifically, we present an inference algorithm that takes as input genetic sequence data, and demonstrate that the sequence length sufficient to reconstruct the correct network with high probability, under the Cavender-Farris-Neyman model of evolution, scales logarithmically, polynomially, or polylogarithmically with the number of taxa, depending on the parameter regime. As part of our contribution, we also present novel inference rules for quartet data in the semi-directed phylogenetic network setting.
PhyloZoo is a Python package that provides a unified framework for working with phylogenetic networks. It supports robust (semi-)directed network objects and offers conversion utilities between common phylogenetic datatypes—such as splits, quartets, distances, sequences, and generators—allowing rapid prototyping and validation of theoretical and algorithmic ideas. Key functions include testing network class membership, exploring encodings via subnetworks or displayed trees, working with splitsystems and distances, and performing isomorphism checks. Beyond research prototyping, PhyloZoo is designed to transform these workflows into user-friendly, reproducible tools. Integrated submodules for phylogenetic diversity (PaNDA) and multi-locus network inference (Squirrel), combined with visualization tools, extensive documentation and examples, and robust I/O, allow the same code and analyses to be packaged for end users. In this talk, I will demonstrate how PhyloZoo enables seamless transitions from theory and algorithm development to empirical analysis, within a framework that is easily extensible for future methods.
* postdoc; ** PhD students.