Week 2: Workshop

Download the picture to see it in better quality.

The main webpage of the workshop is here. Below we provide all relevant local information.

Important: Note that the venue changes for the Wednesday talks. Also the Thursday talk of Bernd Sturmfels will be in a different location. For more details see below.

SCHEDULE

Preliminary schedule for the workshop 26-30 June 2017

VENUE (excluding Wednesday)

The workshop will take place at room 001 in the Facultat de Matemàtiques i Estadística FME (Universitat Politècnica de Catalunya) located at:

Campus Diagonal Sud, Edifici U

C/ Pau Gargallo 5

08028 Barcelona


VENUE FOR WEDNESDAY

On Wednesday, we meet in Parc de Recerca Biomèdica de Barcelona (PRBB)

C/ Doctor Aiguader 88, 08003 Barcelona

The session will be in Sala Xipre (173.06).

We would like to thank Robert Castelo for helping us with organisation of the Wednesday session.

The poster session will take place nearby, at Universitat Pompeu Fabra, in Mercè Rodoreda building (room 24.S18)

Ramon Trias Fargas, 25-27

08005 Barcelona

Parc de Recerca Biomèdica de Barcelona (above) and Universitat Pompeu Fabra (below)

There are numerous lunch options in the neighborhood. You are not advised to use the cafeteria at PRBB as it gets quite packed around the lunch time. The main three options are:

  • Lunch in one of the restaurants on the beach just next to the PRBB building. The lunch menú us around 16 EUR and it is a fancier version of what you usually get in the city (typically including wine as a drink).
  • There are two cafeterias on the UPF campus, where we are having our poster session. This is within a 5-10 minute walking distance from PRBB (see the map above).
  • Around this UPF's campus there are also numerous lunch places with lunch menús for around 10 EUR. For example on Carrer de la Marina (in the block between Carrer de Ramon Turró and Carrer de Llull) you have a vegen place All you eat is love, a Basque place Urretxu, a sushi place, a Chinese and a Turkish restaurants. On Carrer de la Sardenya (also between Carrer de Ramon Turró and Carrer de Llull) there are two nice light lunch options: Cafè Menssana and a Brazilian restaurant Rio D.O.C.

CONFERENCE DINNER

The conference dinner will take place on Wednesday at the restaurant Ca la Nuri situated on the beach of Barceloneta.

Passeig Marítim de la Barceloneta, 55, 08005 Barcelona

BGSMath Colloquium of Bernd Sturmfels

On Thursday, June 29, Bernd Sturmfels will give a BGSMath Colloquium. For that reason no talks are scheduled for Thursday afternoon. The talk will place in a picturesque Institut d’Estudis Catalans (IEC). The institut is located on Carrer del Carme 47 next to Las Ramblas and Plaça de Catalunya.

Alternatively, consider attending the PRBB Computational Genomics Seminar by Olivier Gascuel, see below.

PRBB Computational Genomics Seminar by Olivier Gascuel

On Thursday, June 29, Olivier Gascuel will give a PRBB Computational Genomics Seminar . The talk will take place in the PRBB building, where we had our Wednesday talks.

TALKS

Elizabeth Allman (University of Alaska Fairbanks)

An Introduction to Algebraic Methods in Phylogenetics

This talk, directed toward both biologists and mathematicians, introduces a few of the uses of algebraic methods that have contributed to the advancement of our understanding of phylogenetic models, and to methods of phylogenetic inference.

The focus is not technical mathematics; instead the talk will highlight three specific topics where algebraic perspectives have lead to advances. In particular, we examine closure properties of base substitution models, identifiability of phylogenetic parameters, and rank conditions on matrices for applications to tree inference.

Cristiano Bocci (University of Siena)

Complex identifiability and Real identifiability

In this talk, I will introduce recent results on complex and/or real identifiability for tensors. All these results are based on a geometric approach mainly using the concepts of weakly-defective varieties and elliptic normal curves

Ruth Davidson (University of Illinois Urbana-Champaign)

Applications of Mining Public Genome Data to Recover Statistical Trends using Geometry

Websites such as TreeBASE.org and datadryad.org provide public access to a wealth of genomic data released with peer-reviewed biological publications. Phylogenomics-the recovery of the common evolutionary history of a group of taxa from short gene samples recovered from long genomes-is a basic area of research that gives rise to many quantitative methods for mining data for evolutionary signals. In turn, myriad fields such as ecology, medicine, and linguistics consume these methods; thus improved methods have very broad scientific impact. We present a publication (joint work with Joseph Rusinko, Zoe Vernon, and Jing Xi) that provides a baseline framework, built on geometric combinatorics, for studying statistical trends in genomic data. Further, we will outline future research directions that will (1) build on this framework to inform the development of new theory and methods for model-testing, and (2) improve the understanding of trends in phylogenomic data in the systematic biology, computer science, statistics, and mathematics communities.

Mareike Fischer (Ernst-Moritz-Arndt-University Greifswald, Germany)

On the reliability of Maximum Parsimony

There are two main aims in phylogenetics: the reconstruction of the evolutionary relationships between different species, which are often represented by a tree, as well as the reconstruction of ancestral sequences (as normally only data for present-day species are given). For both aims, one of the most frequently discussed method is the Fitch algorithm, which is based on the optimization criterion of Maximum Parsimony (MP). But this criterion has some inherent problems. Concerning tree reconstruction, for instance, the method is known to suffer from statistical inconsistency in the so-called Felsenstein zone. This means that given more data, the reconstruction can converge to the wrong tree. Moreover, concerning ancestral sequence reconstruction, it is known that MP under certain circumstances gives more reliable results when some of the data are ignored.

I re-investigate both aspects in my talk – first, we look at the statistical inconsistency in the Felsenstein zone and show that this problem can unfortunately not be overcome by considering multiple columns e.g. of a DNA alignment at once. We also provide some new lower bounds for the reconstruction accuracy of ancestral sequence reconstruction with parsimony.

The research presented in my talk is joint work with Michelle Galla, Lina Herbst and Kristina Wicke.

Andrew Francis (Western Sydney University)

Genome arrangement phylogeny in the Cayley graph

Modelling bacterial genome rearrangement operations as group actions on the space of all possible genomes provides a one-to-one correspondence between genome space and the group that acts. This means that a subset of genomes defines a set of points on the Cayley graph of the group, and a phylogeny on those genomes is represented by a Steiner tree on those points. In this talk I will describe this viewpoint and several related results. For instance, I will show how group theory can be used to calculate the rearrangement distance between genomes, as well as a more nuanced view of the distance between genomes through a maximum likelihood estimate.

Toni Gabaldón (Barcelona Institute of Science and Technology)

Phylogenomics to uncover past events of reticulated evolution in eukaryotes

Modern evolutionary thinking recognizes processes of reticulated evolution, such as horizontal gene transfer or hybridization, which involve total or partial merging of genetic material from two diverged species. Today it is widely recognized that such events are rampant in prokaryotes, but a relevant role in eukaryotes has only recently been acknowledged. Unprecedented genomic and phylogenetic information, and recent work have shown that reticulate evolution in eukaryotes is more common and have more complex outcomes than previously thought. In this talk I will discuss recent evidence of reticulated evolution in different eukaryotic groups, such as fungi, and how phylogenomics can help us to uncover such processes.

Olivier Gascuel (Institut Pasteur & CNRS, Paris)

Boosting Felsenstein’s Bootstrap

Felsenstein’s article, describing the application of the bootstrap principle to evolutionary trees, is one of the most cited papers of all time. This statistical method, based on data resampling and replications, is used extensively to assess the robustness of phylogenetic inferences. However, today we have more and more sequences for a wide variety of species, and phylogenies with hundreds or thousands taxa are becoming routine. In these conditions the usefulness of Felsenstein’s bootstrap is questionable, as it tends to yield very low support values, especially on deep branches. I’ll describe a revised version of Felsenstein’s bootstrap, resulting in higher supports, while not inducing falsely supported branches. Applications to large mammalian, HIV, and simulated datasets will be discussed.

Joint work with F. Lemoine, J.B. Domelevo Entfellner, E. Wilkinson, T. De Oliveira

Gacuel.pdf

Elizabeth Gross (San Jose State University)

Dimensions of Tree Secants and Joins

Identifiability results for 2-tree mixture models have relied on knowing the dimension of their corresponding join varieties, which raises the following question: When do the secants and joins of toric varieties corresponding to group-based phylogenetic tree models have the expected dimension? In 2011, Allman, Petrović, Sullivant, and Rhodes showed that for the Jukes-Cantor model, the associated secant varieties of 2-tree mixtures have the expected dimension. In this talk, we extend this result and show that for the Kimura 2-parameter and Kimura 3-parameter models and any pair or trees (not necessarily binary), the corresponding join variety has the expected dimension. Additionally, we will discuss generalizations of this result to arbitrary group-based models and mixtures of an arbitrary number of trees. This is joint work with Hector Baños, Nathaniel Bushek, Ruth Davidson, Pamela E. Harris, Robert Krone, Colby Long, Allen Stewart, and Robert Walker and is the product of the 2016 Mathematical Research Community on Algebraic Statistics.

Katharina Huber (University of East Anglia)

Triplet covers

A classical result, fundamental to evolutionary biology, states that an edge-weighted tree T with leaf set X, positive edge weights, and no vertices of degree 2 can be uniquely reconstructed from the set of leaf-to-leaf distances between any two elements of X. In biology, X corresponds to a set of taxa (e.g. extant species) and the tree T describes their phylogenetic relationships. Provided one has access to all distances, and these are known to be sufficiently close to the distances induced by some (as yet unknown) tree, then that tree, together with its edge weighting, can be computed with some degree of confidence from those distances in polynomial time using, for example, Neighbor-Joining. However, much of the data being generated even by modern genomic methods have patchy taxon coverage whereby only certain pairs of taxa have a known (or, at least, sufficiently reliable) distance. This raises interesting mathematical questions (besides the obvious statistical and algorithmic ones) concerning tree reconstruction from such incomplete data some of which we will address in this talk.

Daniel Huson (University of Tübingen)

SplitsTree5 - a reimplementation of the SplitsTree program

The Java SplitsTree is a widely used for computing unrooted phylogenetic networks. Written in 2005, the look-and-feel of the program is now quite dated and the algorithms and data structures used were designed for small datasets (by todays standards). In this presentation we report on progress toward implementing SplitsTree5, in Java using the JavaFX library. The program is based on a flexible “document update graph”, a directed graph in which nodes represent data or algorithms. One goal is to produce more efficient code. We have implemented a new split network viewer that operates both in 2D and 3D. We are working on a haplotype graph viewer and other viewers. A survey of papers that use the SplitsTree program shows that the Methods description in such papers is often uninformative and states something like “A network was computed using SplitsTree”. To address this, we are working on a function that will automatically generate a one paragraph Methods description that explicitly mentions all involved algorithms and their citations.

Leo van Iersel (Delft University of Technology)

Rearrangement Moves on Rooted Phylogenetic Networks

Phylogenetic tree reconstruction is usually done by local search heuristics that explore the space of the possible tree topologies via simple rearrangements of their structure. Tree rearrangement heuristics have been used in combination with practically all optimization criteria in use, from maximum likelihood and parsimony to distance-based principles, and in a Bayesian context. Their basic components are rearrangement moves that specify all possible ways of generating alternative phylogenies from a given one, and whose fundamental property is to be able to transform, by repeated application, any phylogeny into any other phylogeny. Despite their long tradition in tree-based phylogenetics, very little research has gone into studying similar rearrangement operations for phylogenetic networks --- that is, phylogenies explicitly representing scenarios that include reticulate events such as hybridization, horizontal gene transfer, population admixture, and recombination. To fill this gap, we propose ``horizontal'' moves that ensure that every network of a certain complexity can be reached from any other network of the same complexity, and ``vertical'' moves that ensure reachability between networks of different complexities. When applied to phylogenetic trees, our horizontal moves --- named rNNI and rSPR --- reduce to the best-known moves on rooted phylogenetic trees, nearest-neighbor interchange and rooted subtree pruning and regrafting. Besides a number of reachability results --- separating the contributions of horizontal and vertical moves --- we prove that rNNI moves are local versions of rSPR moves, and provide bounds on the sizes of the rNNI neighborhoods. The paper focuses on the most biologically meaningful versions of phylogenetic networks, where edges are oriented and reticulation events clearly identified. Moreover, our rearrangement moves are robust to the fact that networks with higher complexity usually allow a better fit with the data. Our goal is to provide a solid basis for practical phylogenetic network reconstruction. This is joint work with Philippe Gambette, Mark Jones, Manuel Lafond, Fabio Pardi and Celine Scornavacca.

Mateusz Michałek (MPI MiS/Polish Academy of Sciences)

Phylogenetic invariants - algebraic and geometric approach

Probability distributions for Markov processes on trees give rise to beautiful algebraic varieties. Two examples are:

- General Markov Model belonging to a large class of equivariant models and corresponding to secant varieties of the Segre variety;

- 3-Kimura Model belonging to a more restrictive class of group-based models, corresponding to a family of toric varieties.

Such algebraic varieties have simple parametrizations, however it is notoriously hard to find their descriptions in terms of defining equations - so called phylogenetic invariants. We will report on two recent results. In a joint work with Casanellas and Fernández-Sánchez we provide a local description, as a complete intersection, for a large class of equivariant models. Second, together with Ventura we proved a conjecture of Sturmfels and Sullivant that predicted the degree of (generating) phylogenetic invariants for the 3-Kimura model and provided finiteness results for arbitrary group-based models.


Michalek.pdf

Vincent Moulton (University of East Anglia)

Assembling the network of life

Biologists commonly represent the evolution of organisms using a phylogenetic tree, that is, a leaf-labeled graph-theoretical tree, the tree-of-life providing a well-known example of such a tree. In recent years, however, it has become increasingly recognised that the evolutionary history of certain organisms (e.g. plants, viruses and bacteria) is not always best represented by a tree. This is due to evolutionary processes that take place on the molecular level, such as recombination, lateral gene transfer and recombination. In such cases phylogenetic networks - more complex leaf-labeled graphs — can provide a useful alternative to trees. We discuss the question of constructing phylogenetic networks from simpler networks, which leads to some interesting combinatorial and computational problems.

John Rhodes (University of Alaska Fairbanks)

Topological metrizations of trees, and applications

The edges of a topological tree can be assigned lengths in at least four different natural ways, that reflect nothing more than the topological structure of the tree. These ways are associated with the standard encodings of phylogenetic tree structure, by displayed splits, clades, quartets, and rooted triples. Although these metrizations provide no new information about the tree, perhaps surprisingly, the split and clade metrizations play key roles in certain statistically-consistent, fast, and well-performing methods of species tree inference from gene trees under the Multispecies Coalescent model of incomplete lineage sorting.

After a brief description of the split and clade metrizations and their use in species tree inference, the two new metrizations will be explained. These lead to new fast methods of inferring large trees from rooted triples or quartets, and to new consensus methods that provide statistically consistent species tree inference under the coalescent model.

Rhodes.pdf

Juan Rodriguez-Rivas (Barcelona Supercomputing Center)

A new generation of coevolution-based methods

Coevolution, defined as the interdependence between the evolutionary changes of two entities, plays a fundamental role at all biological levels, from ecological to molecular systems. I will review some of the remarkable advances achieved in the field of molecular coevolution over the last 20 years. I will cover three main research lines: i) The similarity of phylogenetic trees as an indicator of physical or functional interactions between pairs of proteins, ii) the detection of differentially conserved positions in multiple sequence alignments, related with the functional divergence between subfamilies, which point to ligand and/or protein binding sites, and iii) protein contact prediction from residue covariation in large sequence alignments. In particular, the detection of covariant positions in large multiple sequence alignments has recently provided a whole new array of insights and applications. In the talk I will outline these advances, including our contributions, and introduce the main current challenge in the field, i.e. the correction of the phylogenetic bias.

Joe Rusinko (Hobart and William Smith College)

From Quartet Trees to Neighbor Joining

As building blocks of phylogenetic inference, quartet trees can play a key role in supertree and species tree reconstruction. However for large datasets the number of quartet trees can quickly become overwhelming. We introduce Efficient Quartet Systems which enable quartet-based species tree reconstruction without the inefficiency of computing all displayed quartets. As a bonus, we describe how an effort to translate these findings to the context of SVDQuartets led to the discovery that Neighbor Joining on concatenated DNA sequences is a remarkably effective technique for species tree reconstruction.

Heiko Schmidt (University of Vienna)

Phylogenetic Inference in a Small N Large P Situation

The easy access of sequencing technologies in the field of virology and epidemics has led to huge collections of publicly available data. It is common practice that samples are taken from patients and parts of the pathogen or viral genome are sequenced to identify the infecting strain or subtype. Unfortunately, on the one hand the sizes of the genomic regions (N) sequenced are typically small. On the other hand the number of taxa (P) desired to be included in epidemiological studies is rather huge. This leads to what is often described as a ‘small N large P’ problem when reconstructing the phylogenetic trees from these kinds of data.

Aspects of ML-based phylogenetic inference in small N large P situations will be discussed. We use large HIV datasets as used to study viral epidemics. We will highlight effects of such datasets on the reconstruction of the underlying phylogeny. We will focus on the application of stochastic nearest neighbor interchange (NNI) search strategies implemented in IQ-TREE and compare them to results from other ML approaches.

This is joint work with Clement Bader and Dimitrios Paraskevis.


Céline Scornavacca (ISE-M, CNRS)

Parental parsimony: a new definition of parsimony for phylogenetic networks

In this talk, we present a new definition of maximum parsimony for phylogenetic networks that permits to model biological scenarios that cannot be modelled by the definitions currently present in the literature (namely, the "hardwired" and "softwired" parsimony). Building on this new definition, we provide several algorithmic results that lay the foundations for new parsimony-based methods for phylogenetic network reconstruction.

Joint work with Leo van Iersel and Mark Jones.

Mike Steel (University of Canterbury)

Phylogenetic questions inspired by the theorems of Arrow and Dilworth

Biologists frequently need to reconcile conflicting estimates of the evolutionary relationships between species by taking a ‘consensus’ of a set of phylogenetic trees. This is because different data and/or different methods can produce different trees. If we think of each tree as providing a ‘vote’ for the unknown true phylogeny, then we can view consensus methods as a type of voting procedure. Kenneth Arrow’s celebrated ‘impossibility theorem’ (1950) shows that no voting procedure can simultaneously satisfy seemingly innocent and desirable properties. We consider a similar axiomatic approach to consensus and asks what properties can be jointly achieved. In the second part of the talk, we consider phylogenetic networks (which are more general than trees as they allow for reticulate evolution). The question ‘when is a phylogenetic network merely a tree with additional links between its edges?’ is relevant to biology and interesting mathematically. It has recently been shown that such ‘tree-based’ networks can be efficiently characterized. We describe some further results related to Dilworth’s theorem for posets (1950), and matching theory in bipartite graphs. In this way, one can obtain fast algorithms for determining when a network is tree-based and, if not, to calculate how ‘close’ to tree-based it is.

Seth Sullivant (North Carolina State University)

Bounds on the Expected Size of the Maximum Agreement Subtree

We prove polynomial upper and lower bounds on the expected size of the maximum agreement subtree of two random binary phylogenetic trees under both the uniform distribution and Yule-Harding distribution. This positively answers a question posed in earlier work. Determining tight upper and lower bounds remains an open problem. Joint work with Daniel Bernstein, Lam Si Tung Ho, Colby Long, Mike Steel and Katherine St. John.

Jeremy Sumner (University of Tasmania)

Dimensional reduction for phylogenetic tree models

I will present a general method of dimensional reduction for phylogenetic tree models. The method reduces the dimension of the model space on a phylogenetic tree from exponential in the number of extant taxa, to quadratic in the number of taxa. A key feature is the identification of an invariant subspace which depends only bilinearly on the model parameters; in contrast to the usual multi-linear dependence in the full model space. The talk will concentrate on the algebraic foundations of the dimensional reduction, particularly discussing the identification of a novel representation of the ``Markov'' group embedded within the usual array of tensor products. I will also present the results of a study where the method was used to compute split (edge) weights from simulated DNA sequences to correctly identify the underlying phylogenetic tree.

Caroline Uhler (Massachusetts Institute of Technology)

Estimating covariance matrices with linear structures

Brownian motion tree models, standard phylogenetic models for the evolution of continuous characters, are defined by linear constraints on the covariance matrix and its inverse. First we study maximum likelihood estimation for Gaussian models with linear constraints on the covariance matrix. This leads to a non-convex optimization problem that typically has many local maxima. Current methods for parameter estimation in such models are based on heuristics with no guarantees. I will present efficient algorithms and explain how to initiate the algorithms in a data-informed way to obtain provable guarantees for parameter estimation in this model class. Next, we study Gaussian models that are multivariate totally positive of order two (MTP2), a model class that is given by linear constraints on the inverse covariance matrix and also includes Brownian motion tree models. We show that maximum likelihood estimation under MTP2 implies sparsity without the need of a tuning parameter. Moreover, we show that the MLE always exists even in the high-dimensional setting. These properties make MTP2 constraints an intriguing alternative to methods for learning sparse graphical models such as the graphical lasso.

Gabriel Valiente (Technical University of Catalonia)

Balance Indices for Phylogenetic Trees

Balance is one of the best studied properties of phylogenetic trees, as the shape of a phylogenetic tree reflects the evolutionary processes that have produced it. Balance indices are thus a good tool to test stochastic models of evolution. In this talk, we present a new balance index for phylogenetic trees, the quartets index. This index makes sense for rooted bifurcating and also multifurcating phylogenetic trees, and it has a larger range of values and a greater resolution power than other balance indices for phylogenetic trees such as the Colless index, the Sackin index or the Mir-Rosselló-Rotger total cophenetic index. Also, we present exact formulas for its expected value and its variance under the Chen-Ford-Winkel alpha-gamma model of evolution.

Valiente.pdf

Carsten Wiuf (University of Copenhagen)

Admixture graphs as a framework for studying human demographic histories

Human demographic histories are entangled webs of splits and merges. Whole genome data have been be used to gain insight into these webs using different statistical tools such as the D-statistic and the F-statistics. In this talk, I’ll propose to use labelled directed graphs and graphical models as a framework to study the F-statistics. This framework or model will be referred to as an admixture graph.

Basic properties of admixtures graph will be explored. Some differences to traditional phylogenetic analysis will be explained and some possible extensions to other areas of genetics will be suggested.

The talk is based on the work of my PhD student Samuele Soraggi.



Ruriko Yoshida (Naval Postgraduate School)

Principal Component Analysis and the Locus of the Fréchet Mean in the Space of Phylogenetic Trees

Most biological data are multidimensional, posing a major challenge to human comprehension and computational analysis. Principal component analysis is the most popular approach to rendering two- or three-dimensional representations of the major trends in such multidimensional data. The problem of multidimensionality is acute in the rapidly growing area of phylogenomics. Evolutionary relationships are represented by phylogenetic trees, and very typically a phylogenomic analysis results in a collection of such trees, one for each gene in the analysis. Principal component analysis offers a means of quantifying variation and summarizing a collection of phylogenies by dimensional reduction. However, the space of all possible phylogenies on a fixed set of species does not form a Euclidean vector space, so principal component analysis must be reformulated in the geometry of tree-space, which is a CAT(0) geodesic metric space. Previous work has focused on construction of the first principal component, or principal geodesic. Here we propose a geometric object which represents a k-th order principal component: the locus of the weighted Fréchet mean of k+1 points in tree-space, where the weights vary over the standard k-dimensional simplex. We establish basic properties of these objects, in particular that locally they generically have dimension k, and we propose an efficient algorithm for projection onto these surfaces. Combined with a stochastic optimization algorithm, this projection algorithm gives a procedure for constructing a principal component of arbitrary order in tree-space. Simulation studies confirm these algorithms perform well, and they are applied to data sets of Apicomplexa gene trees and the African coelacanth genome. The results enable visualizations of slices of tree-space, revealing structure within these complex data sets.

Yoshida.pdf

Louxin Zhang (National University of Singapore)

Combinatorial and Algorithmic Issues of Reticulation-Visible Phylogenetic Networks

A network is reticulation-visible if for every reticulation node, there exists a leaf such that every path from the network root to the leaf must contain the reticulation node. Recently, it has been shown that reticulation-visible networks have nice combinatorial properties. It has also been shown that the tree containment problem is polynomial time solvable on such networks. I this talk, I will discuss these results and the techniques used in the study of such networks.

POSTERS

Ernesto Álvarez González (Complutense University of Madrid)

Different methods of constructing phylogenetics trees

There are different methods of constructing phylogenetics trees: some of them are built by distance methods and some other, by optimality criterion. In this poster, I wish to stablish these two different approaches, attending advantages and disadvantages. Particular attention I will pay to the Maximum Likelihood constucting method, the same that is related to the problem I work on as a part of my PHD project.

Hector Daniel Baños Cervantes (University of Alaska Fairbanks)

Identifying species hybridization network features from gene tree quartets

A phylogenetic tree is not always enough to describe the relation between species, in particular, when hybridization, horizontal gene transfer or gene flow occurs. Phylogenetic networks are the objects used to represent the relation between species that admit such events. We focus on phylogenetic networks whose cycles do not share edges, known as level-1 networks. Under the coalescent model on a level-1 network, the probabilities of gene tree quartets can be computed in terms of the probabilities arising on simplified networks. Using only the natural descending order in the real line of such probabilities for each 4-taxon set, we can generically identify all cycles of size greater than 3, and hybrid nodes in the cycles of size greater than 4, in the unrooted species network. We also show that we cannot identify the hybrid node of a 4-cycle by this approach.

Chris Durden (North Carolina State University)

A coalescent-based k-mer method for phylogenetic reconstruction

We develop a coalescent-based method for reconstructing a tree on n taxa from k-mer data (k-mers are subsequences of length k). For the Jukes-Cantor mutation model, we obtain a rational map for the expected distances between k-mer count vectors, which depends on the branch lengths of the species tree and a population size parameter. By studying invariants of this model, we give conditions on n and k for which the (unrooted) species tree parameter can be reconstructed.

Miraine Davila Felipe (Institut Pasteur (Paris))

Mathematical properties of the transfer distance between bipartitions and tree topologies

In this poster I will describe the main properties of the transfer distance which was introduced in the classification framework by Day (Math. Soc. Sc., 1981) and studied by Lin et al. (Trans. Comp. Biol. Bioinf., 2012) in the context of tree matching. It allows to compare a branch b in a reference tree topology to a second branch in another tree topology T, both on the same set of n taxa. Roughly speaking, the transfer distance between these branches is the number of taxa that must be transferred in T, in order to make both branches identical. Then, if we take the minimum of the transfer distance over all branches in T, we get a "transfer index", measuring the presence of b in T. We provide an upper-bound for the transfer index, compare the transfer index to the parsimony score, and describe some results making possible to assess the statistical significance of a given value of the transfer index as a function of n and b, when T is randomly drawn according to a null model.

*Joint work with J-B. Domelevo Entfellner, F. Lemoine and O. Gascuel

Michelle Galla (Ernst-Moritz-Arndt-University Greifswald)

Alternative methods of alignment analysis for different kinds of tree reconstruction in phylogenetics

One of the main aims of biology and phylogenetics is to reconstruct the “Tree of Life”. In this respect, different methods are used to analyze DNA sequences of different species and to compare them in order to derive the relationships of different species. However, it is well-known that these methods can lead to wrong relationship estimates. One typical problem of tree reconstruction methods, like for instance Maximum Parsimony, is statistical inconsistency. Therefore, new methods of alignment analysis are needed, and it should be tested if they lead to better results. In my talk, I will present a blockwise approach to alignment analysis, namely so-called k-tuple analyses. For four taxa is has already been shown that k-tuple-based analyses are statistically inconsistent if and only if the standard character-based (site-based) analyses are statistically inconsistent. So in the four-taxon case, going from individual sites to k-tuples unfortunately does not lead to any improvement. However, real biological analyses often consider more than only four taxa. Therefore, we analyze the case of at least five taxa and show that the equivalence of the single-site data and the k-tuple data then no longer holds. We also show that the blockwise approach can be used to test the robustness/relibability of the resulting trees.

(Joint work with Mareike Fischer)

Marina Garrote-López (Universitat Politècnica de Catalunya)

The rank of positive-semidefinite approximation of low rank matrices and its application in Phylogenetics

In many areas of applied linear algebra, it is necessary to work with matrix approximations. A usual situation occurs when a matrix obtained from experimental or simulated data is needed to be approximated by a matrix that lies in a statistical model or satisfies some specific properties. We study symmetric and positive-semidefinite approximations and we show that the positive and negative indices of inertia of the symmetric approximation and the rank of the positive-semidefinite approximation are always bounded from above by the rank of the original matrix. We show how this results can be applied while we are using algebraic tools in phylogenetic reconstruction. One main goal of phylogenetic reconstruction is recovering the ancestral relationships among a group of current species. In order to reconstruct phylogenetic trees it is usual to model evolution adopting a parametric statistical model which allows us to define a joint probability distribution at the leaves of the trees. When these models are algebraic one is able to deduce polynomial relationships between these probabilities, known as phylogenetic invariants. There is a special situation when these theoretical probabilities can be placed into a matrix that has to be positive-semidefinite of low rank, which gives us minors that are phylogenetic invariants and their vanishing provide interesting information about the tree topology. But unfortunately these conditions are not always satisfied when working with real or simulated data. The aim of our research is to use phylogenetic invariants and the stochasticity of the parameters of the general Markov model (e.g. the rank constraints aforementioned) to provide insight into phylogenetic inference and to design new methods for phylogenetic reconstruction.

Lina Herbst (Ernst-Moritz-Arndt-University Greifswald)

The accuracy of ancestral sequence reconstruction with parsimony

We investigate a mathematical question concerning the reconstruction accuracy of Fitch’s maximum parsimony algorithm for reconstructing the ancestral sequence data of the most recent common ancestor given a phylogenetic tree and sequence data (character) for all taxa under consideration. In particular, in the case of the symmetric four state substitution model, we answer affirmatively a conjecture of Li, Steel and Zhang which states that for any ultrametric phylogenetic tree and a symmetric model, the Fitch parsimony method using all terminal taxa is more accurate, or at least as accurate, for ancestral state reconstruction than using any particular terminal taxon. This conjecture has already been answered for two-state characters by Fischer and Thatte. Here, we focus on answering the biologically relevant case with four states (DNA-alphabet).

This project is joint work with Mareike Fischer.

Linard Hoessly (Universität Fribourg)

Fundamental polytopes of metric trees via hyperplane arrangements

We tackle the problem of a combinatorial classification of finite metric spaces via their fundamental polytopes, suggested by Vershik in 2010. We define a hyperplane arrangement associated to every split pseudometric and we use the combinatorics of its underlying matroid in order to compute the face numbers of fundamental polytopes and Lipschitz polytopes of tree-like metrics. We then develop some tools allowing us to apply our formulas to specific cases, and we briefly discuss the potential of our model beyond tree-like metrics.

Colby Long (Mathematical Biosciences Institute)

Distinguishing Phylogenetic Networks

Phylogenetic networks are becoming increasingly popular in phylogenetics since they have the ability to describe a wider range of evolutionary events than their tree counterparts. In this poster, we present results for Markov models on phylogenetic networks, i.e. directed acyclic graphs, and their associated geometry. We restrict our attention to large cycle networks, networks with a single undirected cycle that has length at least four. Using tools from computational algebraic geometry, we show that the semi-directed network topology for the Jukes-Cantor model on large cycle networks is generically identifiable.

Yangjing Long (University of Greifswald)

Reconstructing unrooted phylogenetic trees from symbolic ternary metrics

In 1998, Böcker and Dress gave a 1-to-1 correspondence between symbolically dated rooted trees and symbolic ultrametrics. We consider the corresponding problem for unrooted trees. More precisely, given a tree T with leaf set X and a proper vertex colouring of its interior vertices, we can map every triple of three different leaves to the colour of its median vertex. We characterise all ternary maps that can be obtained in this way in terms of 4- and 5-point conditions, and we show that the corresponding tree and its colouring can be reconstructed from a ternary map that satisfies those conditions. Further, we give an additional condition that characterises whether the tree is binary, and we describe an algorithm that reconstructs general trees in a bottom-up fashion.

Jonathan Mitchell (University of Alaska Fairbanks)

Quartet Test Statistics Under the Multispecies Coalescent

The multispecies coalescent model describes the process of incomplete lineage sorting, which can lead to gene trees having differing topologies. Given a sample of gene trees, how can we test whether they might have arisen from the multispecies coalescent model? Our approach will be to focus on displayed quartets. For 4-taxon trees there is a singularity when the frequencies of the three possible gene tree topologies are equal. Near the singularity there is uncertainty over which of the three possible gene tree topologies is the species tree topology. We build upon Drton's work on distributions of test statistics at singularities. We find the distribution of the log-likelihood ratio test statistic at the singularity. We are then able to test whether a set of gene tree frequencies could have arisen under the multispecies coalescent model, regardless of whether the gene tree frequencies are close to the singularity or not.

Joan Carles Pons (Universitat de les Illes Balears)

Embedded support trees for tree-based networks

A recently studied general class of phylogenetic networks, called tree-based, are obtained by starting with a backbone phylogenetic tree and adding afterwards some additional arcs on it. We introduce a bipartite graph associated with the phylogenetic network which allows to distinguish between different support trees of a tree-based network and we compute the exact amount of such trees in the binary case.

Venta Terauds (University of Tasmania)

Efficient computation of maximum likelihood distances for circular genomes

For circular genomes with even a small number of regions, estimating evolutionary distances can be computationally intensive. We show how techniques from representation theory can be applied to simplify distance calculations, and present some results.

Kristina Wicke (University of Greifswald)

Comparing the rankings obtained from two biodiversity indices: the Fair Proportion Index and the Shapley Value

The Shapley Value and the Fair Proportion Index of phylogenetic trees have been frequently discussed as prioritization tools in conservation biology. Both indices rank species according to their contribution to total phylogenetic diversity, allowing for a simple conservation criterion. While both indices have their specific advantages and drawbacks, it has recently been shown that both values are closely related. However, as different authors use different versions of the Shapley Value, the specific degree of relatedness depends on the specific version of the Shapley Value - it ranges from a high correlation index to equality of the indices. We provide an overview of the different indices and compare them by looking at the mere ranking order provided by either of the indices. We show that even though the chance of two rankings being exactly identical (when obtained from different versions of the Shapley Value) decreases with an increasing number of taxa, the distance between the two rankings converges to zero, i.e. the rankings are becoming more and more alike.

All presented calculations have been performed using FairShapley, our new software package which is publicly available.

(Kristina Wicke & Mareike Fischer)


PARTICIPANTS

Elizabeth Allman (University of Alaska Fairbanks)

Ernesto Álvarez González (Complutense University of Madrid)

Hector Daniel Baños Cervantes (University of Alaska Fairbanks)

Krzysztof Bartoszek (Uppsala University)

Daniel Bernstein (North Carolina State University)

Sangeeta Bhatia (Western Sydney University)

Cristiano Bocci (University of Siena)

Rosa Camps (Univ. Autònoma de Barcelona)

Gabriel Cardona (Universitat de les Illes Balears)

Marta Casanellas (Universitat Politècnica de Catalunya)

Jesus Cerquides (IIIA-CSIC)

Jane Coons (North Carolina State University)

Ruth Davidson (University of Illinois Urbana-Champaign)

Miraine DAVILA FELIPE (Institut Pasteur)

Pablo Duchen (University of Lausanne)

Chris Durden (North Carolina State University)

Jesús Fernández Sánchez (Univesitat Politècnica de Catalunya)

Mareike Fischer (Greifswald University)

Andrew Francis (Western Sydney University)

Toni Gabaldón (Centre for Genomic Regulation)

Michelle Galla (Ernst-Moritz-Arndt-University Greifswald, Institute for mathematics and informatics)

Marina Garrote-López (Universitat Politècnica de Catalunya)

Olivier Gascuel (Institut Pasteur)

Alexandros Grosdos Koutsoumpelias( University of Osnabrück)

Elizabeth Gross (San José State University)

Momoko Hayamizu (The Institute of Statistical Mathematics, JST PRESTO)

Lina Herbst (Ernst-Moritz-Arndt-University Greifswald)

Daniel Hernández Serrano (Universidad de Salamanca)

Linard Hoessly (Universität Fribourg)

Katharina Huber (University of East Anglia)

Remie Janssen (TU Delft)

Mark Jones (TU Delft)

Anastasiia Kim (University of New Mexico)

Dimitra Kosta (University of Glasgow)

Kaie Kubjas (Aalto University)

Manuel Lafond (University of Ottawa)

Colby Long (Mathematical Biosciences Institute)

Yangjing Long (University of Greifswald)

Mateusz Michałek (MPI MiS Leipzig/Polish Academy of Sciences)

Jonathan Mitchell (University of Alaska Fairbanks)

Fatemeh Mohammadi (University of Bristol)

Vincent Moulton (University of East Anglia)

Ferran Muiños (IRB Barcelona)

Joan Carles Pons (Universitat de les Illes Balears)

Xavier Richard (University of Fribourg)

Marta Riutort (Dpt. Genètica, Microbiologia i Estadística (UB)

Jordi Roca Lacostena (Universitat Politècnica de Catalunya)

Juan Rodriguez (Barcelona Supercomputing Center)

Joseph Rusinko (Hobart and William Smith Colleges)

Heiko Schmidt (CIBIV, University of Vienna)

Celine Scornavacca (CNRS)

Jack Simpson (University of Canterbury New Zealand)

Mike Steel (University of Canterbury)

Seth Sullivant (North Carolina State University)

Jeremy Sumner (University of Tasmania)

Venta Terauds (University of Tasmania)

Caroline Uhler (MIT)

Gabriel Valiente (Technical University of Catalonia)

Leo van Iersel (TU Delft)

Vasiliki Velona (Universitat Pompeu Fabra)

Arndt von Haeseler (CIBIV, University of Vienna)

Kristina Wicke (University of Greifswald)

Carsten Wiuf (University of Copenhagen)

Ruriko Yoshida (Naval Postgraduate School)

Magdalena Zajączkowska (University of Warwick)

Louxin Zhang (National University of Singapore)

Xu Zhang (University of Kentucky)

Piotr Zwiernik (Universitat Pompeu Fabra)