# Open positions

### I have been awarded an ANR JCJC grant starting in 2020 and running for 4 years. I am looking for a Ph.D. student with a good mathematical background [**theoretical computer science** (knowledge in algorithmics is a big plus) and/or probability theory] who is interested in working on projects related to evolutionary biology.

**theoretical computer science**(knowledge in algorithmics is a big plus) and/or probability theory] who is interested in working on projects related to evolutionary biology.

# Internship proposal 2020-21

### Axiomatic in phylogeny-reconstruction distance-based methods and flat clustering with overlap

**Main laboratory**: ISEM, Montpellier, France. Celine Scornavacca (Molecular Phylogeny and Evolution team) is a specialist in phylogenetic networks, combinatorial algorithms, modelling in phylogenomics.

**Partner laboratories**: LIRMM, Montpellier, France. Dimitrios Thilikos and Christophe Paul (ALgorithmes, Graphes et COmbinatoire team) are specialists in combinatorial algorithms, graph theory and more precisely parameterized complexity and graph decompositions.

**Skills required**: strong background in algorithms, analytical skills; an interest in evolutionary models and python programming will be a plus.

**How to apply**: Cover letter and CV (with academic transcript) to be sent to celine.scornavacca@umontpellier.fr and christophe.paul@lirmm.fr.

**Proposal description**:

Phylogenetics aims at clarifying the evolutionary relationships that exist among different species, represented through rooted and leaf-labelled directed trees called phylogenetic trees or phylogenies. One possible way to reconstruct such phylogenies is to apply phylogeny-reconstruction distance-based methods to dissimilarity matrices, which are usually obtained from DNA or protein sequences [1]. These distance-based methods can be seen as hierarchical clustering heuristics that, if the dissimilarity is correctly defined, should correctly infer phylogenetic relationships [2].

Clustering aims at identifying in a data set, groups/clusters in a way that elements of the same group are more similar than elements from different groups. One can distinguish at clustering, where the output is a partition of the input data set, from hierarchical clustering, where the output is a set of nested partitions if the input data set. In his seminal paper, Kleinberg [7] defined an axiomatic composed of three natural properties that any ( at) clustering heuristic should satisfy and proved an impossibility theorem: no clustering function can simultaneously fulfil this axiomatic. Carlson and Memoli [8] pushed Kleinberg's work in the context of hiearchical clustering. If the validity of Kleinberg's axiomatic has been questioned in some recent work (see for example [6]), Kleinberg opened an important research line dealing with the explicability of clustering methods.

In the same way, in phylogenetics, some problems such as the consensus tree problem beneted from an axiomatic study (see, for instance, Section 6.3 in [3] and [4]). Here, we are interested in proposing an axiomatic study of phylogeny-reconstruction distance-based methods in the light of the axioms proposed by Kleinberg and subsequent works. In a second step, we would like to extend this study, adding axioms specific to evolutionary distances. In a third step, we would like to work on at clustering with overlap, aiming at conceiving better distance-based methods for phylogenetic networks, i.e. rooted and leaf-labelled directed acyclic graphs (DAGs) used to depict evolution in the presence of reticulate events such as hybridizations [5]. In this case, since some internal nodes may have more than one parent, this results in a hierarchical clustering with overlap. Flat clustering with overlap will be the first step to study the clustering properties of the underlying phylogenetic networks.

**Practical information**: The internship is a full-time position for 4-6 months, during which the intern is expected to be present in Montpellier, France. Monthly salary is about 600e.

**Note**: If the internship will come to fruition, the intern will have the possibility to continue her/his work by way of a PhD scholarship (funding already acquired).

**References**

[1] Pardi F and Gascuel O. Distance- based methods in phylogenetics. Richard M. Kliman. Encyclopedia of Evolutionary Biology, Elsevier, pp.458-465, 2016, 1st Edition.

[2] Retzlaff N, Stadler PF. Phylogenetics beyond biology. Theory Biosci. 2018;137(2):133-143.

[3] Bryant, D. Building trees, hunting for trees and comparing trees. Ph.D. thesis, University of Canterbury, Department of Math. 1997.

[4] Mike Steel, Joel D. Velasco, Axiomatic Opportunities and Obstacles for Inferring a Species Tree from Gene Trees, Systematic Biology, Volume 63, Issue 5, September 2014, Pages 772-778.

[5] Huson DH, Rupp R, Scornavacca C: Phylogenetic Networks - Concepts, Algorithms and Applications. Cambridge University Press 2010, ISBN 978-0-521-75596-2, pp. I-XII, 1-362.

[6] V. Cohen-Addad, V. Kanade and F. Mallmann-Trenn. Clustering redemption { Beyond the impossibility of Kleingerb's axioms. International Conference on Neural Information Processing Systems (NIPS), 2018.

[7] J. Kleinberg. An impossibility theorem for clustering. Advances in Neural Information Processing Systems, pp. 463-470, 2002.

[8] G. Carlsson and F. MĂ©moli. Characterization, stability and convergences of hierarchical clustering methods. Journal of Machine Learning Research, 11:1425-1470, 2010.