Open positions

I have been awarded an ANR JCJC grant starting in 2020 and running for 4 years. I am looking for a Ph.D. student and a 2-years post-doc with a good mathematical background (statistics and combinatorics) who are interested in working on projects related to evolutionary biology. I also have a 1-year (maybe extendable) position for a software engineer opening in 2021.

6-months internship proposal (first half of 2020)

Computing the scanwidth of a DAG efficiently

Main laboratory: ISEM, Montpellier, France. Céline Scornavacca (Molecular Phylogeny and Evolution team) is a specialist in phylogenetic networks, combinatorial algorithms, modelling in phylogenomics.

Partner laboratories: LIGM, Paris, France. Mathias Weller (Algorithmics for Bioinformatics team) is a specialist in parameterized algorithms, structural parameterization, preprocessing and graph theory.

Skills required: strong background in algorithms, analytical skills, C++/Java programming and an interest in evolutionary models will be a plus.

How to apply: Cover letter and CV (with academic transcript) to be sent to celine.scornavacca@umontpellier.fr and mathias.weller@u-pem.fr.

Background: Phylogenetic networks are rooted and leaf-labelled directed acyclic graphs (DAGs) used to depict the evolution of a set of species in the presence of reticulate events such as hybridizations, where two species combine their genetic material to create a new species. Reconstructing these networks from molecular data is challenging and current algorithms fail to scale up to genome-wide data. Aiming at designing faster parameterized algorithms for this task, we recently stumbled [1] on a new width parameter for DAGs, which we call "scanwidth". To get an intuition, imagine a scanner line traversing a network from the leaves to the root; at any moment, its width is the number of arcs it cuts. As the line moves up, it traverses nodes, changing the set of arcs it cuts and, hence its width. The cutwidth of the network is the largest width achieved by such a traversing line. Now, consider multiple independent scanner lines, each one scanning an arc incoming to a different leaf of the network. Whenever a node could be passed by two different lines, they are merged to form a single one. This naturally generalizes the cutwidth to a smaller width measure that we call scanwidth. As with the cutwidth, different orders in which the nodes are passed imply different values of the final width and the goal is to minimize it. The scanwidth broadens the arsenal of width measures that can be used to attack hard problems in phylogenetics and permitted us to design a faster parameterized algorithm for network reconstruction, which allowed us to handle several real-world datasets within minutes instead of weeks [2]. Still, since deciding the scanwidth is NP-complete even for very restricted classes of networks [1], in our implementation we actually used a simple heuristics to compute it, leaving a large potential for improvement.

Task: In this internship, the candidate will study how the scanwidth relates to other width measures and graph-theoretic problems as well as design algorithms to compute and approximate this parameter efficiently. It is also desirable to develop an implementation of the resulting algorithms.

Practical information: The internship is a full-time position for 4-6 months, during which the intern is expected to be present in Montpellier, France. Monthly salary is about 600e.

Note: If the internship will come to fruition, the intern will have the possibility to continue her/his work by way of a PhD scholarship (funding already acquired).

References

[1] Berry V, Scornavacca C, Weller M, Proceedings of the 46th International Conference on Current Trends in Theory and Practice of Computer Science, 2020.

[2] Rabier C-E , Berry V, F. Pardi F, and Scornavacca C. On the inference of complicated phylogenetic networks by Markov chain monte-carlo. submitted

Proposition de stage M2 2019-2020

Caractérisation de la structure mosaïque des génomes de riz

(english version below)

Laboratoire principal: ISEM, Céline Scornavacca (équipe Phylogénie et Évolution Moléculaires), spécialiste réseaux phylogénétiques, algorithmique combinatoire, modélisation en phylogénomique.

Laboratoires partenaires :

• LIRMM, Vincent Berry (équipe Méthodes et Algorithmes pour la Bioinformatique), spécialiste arbres phylogénétiques, algorithmique combinatoire.

• AGAP, Jean-Christophe Glaszmann (équipe Dynamiques de la Diversité, Sociétés, Environnements), spécialiste évolution du riz.

Compétences recherchées : Intérêt pour les modèles évolutifs, capacité d’analyse, aisance en algorithmique, programmation python, connaissances des principales commandes linux.

Comment se porter candidat.e : Lettre de motivation et CV (avec relevéde notes) à envoyer à celine.scornavacca@umontpellier.fr et vberry@lirmm.fr

Sujet :

La caractérisation fine de la structure en mosaïque des génomes est une thématique située à une échelle évolutive intermédiaire entre la génétique des populations et la phylogénie des espèces. Cette problématique se retrouve dans de nombreux domaines du vivant, en particulier à propos de la domestication du riz que nous étudions dans une collaboration entre laboratoires partenaires de cette proposition de stage (Santos et al., 2019, Berry et al. 2020). Dans ce cadre, ce sont près de 3500 génomes cultivés et sauvages de 400 millions de paires de base chacun qu’il faut analyser pour caractériser la structure mosaïque de fragments de génomes provenant d’hybridations entre un petit nombre de fondateurs ancestraux. Nous avons tout récemment proposé une méthode estimant l’histoire évolutive des sous-espèces de riz cultivés (grands groupes de variétés) représentée par un arbre ou un réseau phylogénétique (Rabier et al. en préparation). La/le stagiaire de M2 utilisera cette méthode pour faire un focus sur les riz de type Japonica et les introgressions répétées qu’ils ont pu subir venant de plusieurs fondateurs. Elle/il participera ensuite à la proposition d’une méthode probabiliste pour inférer la structure mosaïque de génomes. Les méthodes actuelles d’inférence de mosaïques sont essentiellement des méthodes de clustering, ne cherchant pas à prendre en compte un modèle évolutif spécifique, et s’écartent de la réalité phylogénétique lorsque des évènements de fondation ou d’hybridation récents ont eu un fort impact. La méthode proposée ici s’appuiera au contraire sur l’histoire évolutive des sous-espèces de riz déjà inférée ainsi que sur un modèle explicite de l’évolution des génomes individuels au sein de cette histoire collective. Enfin, la/le stagiaire participera à des expérimentations pour répondre à plusieurs questions scientifiques précises sur les riz cultivés. En particulier la comparaison des mosaïques issues des approches de clustering et phylogénétique nous éclairera-t-elle sur l’impact local de la sélection et donc sur des motifs génomiques porteurs de valeur adaptative ?

Santos J, Chebotarov D, McNally K, Bartholome J, Droc G, Billot C, Glaszmann J-C, Genome Biology and Evolution,11(5): 1358-1373, 2019.

Berry V, Scornavacca C, Weller M, Proc. of 46th International Conference on Current Trends in Theory and Practice of Computer Science, 2020.

Rabier C-E, Berry V, Pardi F, Scornavacca C, manuscrit.

6-months internship proposal (first half of 2020)

Characterization of the mosaic structure of rice genomes

Main laboratory: ISEM, Céline Scornavacca (Molecular Phylogeny and Evolution team), specialist in phylogenetic networks, combinatorial algorithms, modelling in phylogenomics.

Partner laboratories :

- LIRMM, Vincent Berry (Methods and Algorithms for Bioinformatics team), phylogenetic tree specialist, combinatorial algorithms.

- AGAP, Jean-Christophe Glaszmann (Diversity Dynamics, Societies, Environment team), rice evolution specialist.

Skills required: interest in evolutionary models, analytical skills, fluency in algorithms, python programming, knowledge of the main Linux commands.

How to apply: Cover letter and CV (with transcript) to be sent to celine.scornavacca@umontpellier.fr and vberry@lirmm.fr

Subject :

The fine characterization of the mosaic structure of genomes is a theme located at an evolutionary scale intermediate between population genetics and species phylogeny. This problem is found in many areas, in particular with regard to the domestication of rice, which we are studying in collaboration with the partner laboratories of this internship proposal (Santos et al., 2019, Berry et al. 2020). In this context, it is nearly 3500 cultivated and wild genomes of 400 million base pairs each that must be analyzed to characterize the mosaic structure of genome fragments resulting from hybridizations between a small number of ancestral founders. We have recently proposed a method estimating the evolutionary history of cultivated rice subspecies (large groups of varieties) represented by a tree or phylogenetic network (Rabier et al. in preparation). The intern will use this method to focus on Japonica rice and the repeated introgressions they may have undergone from several founders. He/she will then participate in the proposal of a probabilistic method to infer the mosaic structure of genomes. Current methods of mosaic inference are essentially clustering methods that do not take into account a specific evolutionary model; interestingly, these methods deviate from phylogenetic reality when recent foundation or hybridization events have had a strong impact. The method proposed here will instead be based on the evolutionary history of the rice subspecies already inferred and on an explicit model of the evolution of individual genomes within this collective history. Finally, the intern will participate in experiments to answer several specific scientific questions about cultivated rice. In particular, will the comparison of mosaics from clustering and phylogenetic approaches shed light on the local impact of selection and thus on genomic motives with adaptive value?

Santos J, Chebotarov D, McNally K, Bartholome J, Droc G, Billot C, Glaszmann J-C, Genome Biology and Evolution,11(5): 1358-1373, 2019.

Berry V, Scornavacca C, Weller M, Proc. of 46th International Conference on Current Trends in Theory and Practice of Computer Science, 2020.

Rabier C-E, Berry V, Pardi F, Scornavacca C, manuscript.