# Courses on Phylogenetics and Combinatorics

The first week of the research program will feature courses by Prof. Mike Steel and Prof. Arnd von Haeseler on Phylogenetics and combinatorics. We will also have each day several contributed talks from the participants. The main webpage of the program is here.

## SCHEDULE

## VENUE

The courses will take place in Room S04 in the Facultat de Matemàtiques i Estadística FME (Universitat Politècnica de Catalunya) located at:

Campus Diagonal Sud, Edifici U

C/ Pau Gargallo 5

08028 Barcelona

## LECTURE SLIDES

# CONTRIBUTED TALKS

### Krzysztof Bartoszek (Uppsala University)

**Limit behaviour of the cophenetic index**

We will consider the limit distribution of an appropriately normalized cophenetic index of the pure birth tree conditioned on n contemporary tips. This normalized phylogenetic balance index is a submartingale that converges almost surely and in L2. When one drops the branch lengths, the limit distribution is a contraction type distribution, similar to the Quicksort limit distribution. In the continuous branch case some heuristic approximations to the limit distribution are proposed and the work is illustrated by simulations. Simulating the cophenetic index using these heuristics is rapid compared to the exact simulation of the pure birth tree.

[1] K. Bartoszek (2017) Exact and approximate limit behaviour of the Yule tree's cophenetic index. ArXiv e-prints 1703.08954

### Sangeeta Bhatia (Western Sydney University)

**A Flexible Framework for Determining Weighted Genome Rearrangement Distance**

Inversion of chromosomal fragments is an important genome rearrangement event and inversion distance is a popular metric for constructing phylogenies. A standard approach to this problem is to place no constraint on the set of inversions that can be used to “sort” an arrangement of genes. That is, any inversion that can happen, will happen and will be used by algorithms. The

biological evidence on the other hand suggests that the frequency of the inversion of a chromosomal segment may depend on its length. Shorter inversions are observed to happen more frequently than longer ones. In this context, a lot of work is being done on finding weighted inversion distances. In algebraic terms, this problem translates to finding a path on weighted Cayley graph. In my talk, I will share our work on developing a flexible framework for determining weighted rearrangement distances using the concept of "rewriting systems" under a general set of rearrangement models. The work presented here uses results from computational group theory.

### Momoko Hayamizu (The Institute of Statistical Mathematics)

**Bounding the size of universal tree-based networks**

A tree-based network on a set X of leaves is called "universal" if any rooted binary phylogenetic tree on X can be its base tree. In my earlier work, it was shown that there exist infinitely many universal tree-based networks for all |X|. In this talk, I will discuss a lower bound for the size of universal tree-based networks. (Joint work with Satoru Fujishige and Shizuo Kaji)

[1] M. Hayamizu, On the existence of infinitely many universal tree-based networks, Journal of Theoretical Biology, Volume 396, 2016, pp. 204-206, http://dx.doi.org/10.1016/j.jtbi.2016.02.023.

### Kaie Kubjas (Aalto University)

**Geometry of time-reversible group-based models**

A phylogenetic model describes the evolution of genes. A group-based phylogenetic model assumes extra structure on the parameters of the model that is related to an abelian group. The joint probabilities of a group-based phylogenetic model are functions in the parameters of the model. Based on the work of Matsen, we study polynomial inequalities that cut out the set of all joint probabilities for a time-reversible group-based model. We apply this knowledge to the study of boundaries and maximum likelihood estimation on group-based models. In particular, we use the degree of an algebraic variety and numerical algebraic geometry to obtain the maximum likelihood estimate exactly for small group-based models. This talk is based on joint work with Dimitra Kosta.

### Jordi Roca-Lacostena (Universitat Politècnica de Catalunya)

**The embedding problem for Markov DNA matrices**

DNA substitution models describe the evolutionary process through nucleotide substitution matrices. Assuming that nucleotide mutations always happen at the same rate through time leads to continuous-time models, which only consider matrices that are the exponential of rate matrices. A different approach appears when one regards the evolutionary process as a whole and considers matrices whose entries are given by the substitution probabilities between nucleotides. The understanding of the connection between these two approaches is fundamental for modeling evolution as it has practical and theoretical consequences, such as the identifiability of rates from experimental biological data.

In this talk, we will give a description of the embedding problem, and see some new results about embeddability when restricted to Kimura 3-substitution-types model and its submodels.

This is a joint work with Marta Casanellas and Jesús Fernández-Sánchez.

### Jeremy Sumner (University of Tasmania)

**Lie-Markov models: some recent work**

In recent years, our research group has been advocating and analysing ``Lie-Markov'' models. Each of these describes a continuous-time Markov chain where, under appropriate definitions, the associated rate-matrices form a Lie algebra. As a consequence, the Lie-Markov models produced DNA substitution matrices which are closed under matrix multiplication. Until recently, however, it was not clear whether it is possible for a Markov model to exhibit multiplicative closure without lying within the Lie-Markov class. I will present a new argument which, under mild assumptions, rules out this possibility; thus a multiplicatively closed Markov model must indeed form a Lie algebra. If time permits, I will also present a novel Lie-Markov model that demonstrates why constructing examples of multiplicatively closed models is extremely difficult unless one exploits the above connection to Lie algebras.

### Kristina Wicke (Ernst-Moritz-Arndt-University Greifswald)

**Phylogenetic diversity and biodiversity indices on networks**

In biodiversity conservation, it is often necessary to prioritize the species to conserve. Existing approaches to prioritization, e.g. the Fair Proportion Index and the Shapley Value, are based on phylogenetic trees and rank species according to their contribution to overall phylogenetic diversity. However, in many cases evolution is not treelike and thus, phylogenetic networks have come to the fore as a generalization of phylogenetic trees, allowing for the representation of non-treelike evolutionary events, such as horizontal gene transfer or hybridization.

Here, we extend the concepts of phylogenetic diversity and phylogenetic biodiversity indices from phylogenetic trees to phylogenetic networks. On the one hand, we consider the treelike content of a phylogenetic network, e.g. the (multi)set of phylogenetic trees displayed by a network and the LSA tree associated with it. On the other hand, we derive the phylogenetic diversity of subsets of taxa and biodiversity indices directly from the internal structure of the network. We use both approaches that are independent of so-called inheritance probabilities and approaches that explicitly incorporate these probabilities.

This is a joint work with Mareike Fischer.

### Xu Zhang (University of Kentucky)

**Dimensionality Reduction via Tropical Geometry**

A dimensionality reduction is applied to high-dimensional data sets in order to solve the problem called the “curse of dimensionality”. This term refers to the problems associated with multivariate data analysis as the dimensionality increases. This problem of multidimensionality is acute in the rapidly growing area of phylogenomics, which can provide insight into relationships and evolutionary patterns of a diversity of organisms, from humans, plants and animals, to microbes and viruses. In this project, we are interested in applying the tropical metric in the max-plus algebra to a dimen- sionality reduction over the space of rooted equidistant phylogenetic trees on m leaves, that is realized as the set of all ultrametrics. In this project, the proposed process of reducing the dimension of the multidimensional data sets on the “treespace” is to take data points in the space into a lower dimensional plane which minimizes the sum of distance between each point in the data set and their orthogonal projection onto the plane, that is, an optimization problem such that minimizing projection residuals between data points and their projections on the plane via the tropical metric in the max-plus algebra.

## Dinner on Thursday:

Pomarada restaurant, 8pm, Passeig de gràcia 78

## Touristic information

Things you MUST do in Barcelona:

- Visit Sagrada Família (even from inside, specially interesting for geometers --full of hyperboloids, paraboloids ...)
- Visit Casa Batlló and/or La Pedrera
- Visit Parc Güell

(for all these visits you need to buy tickets in advance on internet)

- Take a walk on the Gothic Quarter (behind the cathedral): Plaça del Rei, Carrer del Bisbe, Carrer Banys Nous, Plaça del Pi, Plaça Sant Felip Neri, Plaça St Iu (and take an iced tea at the "patio" of the Frederic Marès Museum--free entrance).
- Take a walk on Passeig the Gracia and Rambla Catalunya

Things it would be worth doing:

- Get tickets for a concert at Palau de la Musica (modernist building)
- Go on top of Tibidabo: nice views and an amusement park for children (also fun for parents)
- Go to the beach: Barceloneta is the closest (we'll have dinner there next Wednesday). Metro: L2 Vila Olímpica
- Go to Sitges by train (either from Sants or Plaça Catalunya) take a walk on the old streets and go to the beach.
- Take a walk around Gracia neighborhood (Plaça de la Vila and streets around), or around Les Corts (Plaça de la Concòrdia and streets around, this is close to the UPC)