We propose replacing the conventional branching property in random population models with the self-similarity property. By leveraging self-similarity techniques, we improve upon key results in the literature, such as the renowned work of Birkner et al. (2005), paving the way for re-examining this field from a fresh perspective. Unlike branching models, which assume independence among individuals, self-similarity allows for the study of complex reproductive dynamics, particularly in scenarios where populations face restrictive resources, making it a more realistic approach.
Furthermore, the study of our newly introduced self-similar measure-valued processes offers an ideal framework for advancing the theory of self-similar Markov processes in infinite dimensions. This is achieved through tools such as duality and particle representations, which are commonly used in mathematical population genetics. Finally, self-similar measure-valued Markov processes hold significant potential in non-parametric Bayesian statistics, where they can contribute to the development of time-dependent models of prior distributions on the space of probability measures.
In this project, we propose a theoretical framework for the convergence of random metric spaces. These spaces involve random distance functions, the so called sample Fermat distances (e.g., Groisman et al., 2022; Hwang et al., 2016), which are constructed from collections of random points but this time an additional noise is added. Our approach draws upon two key components: (1) the general theory of metric measure spaces and the Gromov-Hausdorff-weak topology, and (2) large deviation results for sample Fermat distances. This framework has applications in machine learning, particularly in clustering and topological data analysis, where being able to incorporate noise in the measurement of data is critical.
In this project, we extend the scope of sample Fermat distances (e.g., Groisman et al., 2022; Hwang et al., 2016) in two significant directions. First, we move beyond the traditional setting of manifolds to accommodate more general metric spaces, with the ultimate goal of including discrete spaces. Second, we introduce a time component into the definition of sample Fermat distances, establishing a connection with dynamical first passage percolation. This contrasts with the time-homogeneous first passage percolation commonly discussed in the current literature. This extended framework has applications in machine learning, particularly in clustering and topological data analysis, enabling more robust analysis of diverse and dynamic datasets.
In this project, we leveraged the simplicity and effectiveness of phase-type distributions to investigate the time to the most recent common ancestor (TMRCA) in population models. In particular, we characterized the density of the TMRCA for populations whose total size evolves deterministically over time. We also demonstrate that the TMRCA has significant potential for distinguishing between competing evolutionary models in practical applications, and that our explicit formula for the density can be effectively used in inference schemes based on maximum likelihood estimation.
We introduce a new class of measure-valued self-similar Markov processes. By extending the well-known Lamperti transformation for self-similar Markov processes to the infinite-dimensional setting, we generalize the celebrated work of Birkner et al. (2005) in mathematical population genetics. They characterize the frequency of types process of stable branching populations in terms of Beta Fleming-Viot processes. We construct a larger class of self-similar populations whose frequency of types is described by a general Lambda Fleming-Viot process.
Our results only scratch the surface of the potential power of the interplay between population genetics and the theory of self-similar Markov processes. This project constitutes a first yet important step in the development of this new research program.
https://arxiv.org/abs/2301.07762
In this project, we investigated a stochastic model of a biological population under the influence of natural selection. Our findings revealed two intriguing phenomena in this model: 1) contrary to common understanding, increasing the strength of selection can increase the genetic variability within the population, and 2) we identified a novel phase transition in traveling waves, which we term the shift from pulled to semi-pulled waves.
https://link.springer.com/article/10.1007/s00285-024-02173-x
In this project we examined the genealogies of a broad class of neutral population models that includes the well-known Cannings' models. Departing from the standard assumption of symmetry in offspring distributions in Cannings' models, we introduced a less restrictive condition of non-heritability of reproductive success. This adjustment provides a more precise mathematical framework for studying neutral biological populations. Additionally, our framework enabled us to analyze the genealogy of a new exponential model. Despite its built-in fitness inheritance mechanism, this model fits within our neutrality setting.
https://alea.impa.br/articles/v18/18-53.pdf
In this project we characterized the Site Frequency Spectrum (SFS) of the Bolthausen-Sznitman coalescent. The SFS is a statistic based on the genetic diversity present in a biological population that is frequently used in Population Genetics for the inference of its evolutionary past. The Bolthausen-Sznitman coalescent is a stochastic model for the genealogy of a population that has been recently proposed as a new null model for populations that are under selective pressure.
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0764-4
https://sourceforge.net/projects/metassembler/
In this project, we designed and implemented a software package for the "de novo" assembly of genomes. The main heuristic of this software is to combine the assemblies produced using multiple (possibly different) algorithms into a single superior sequence by identifying and merging the best sequence stretches from each of them.
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0506-z
The aim of this project was to characterize the genome of novel strains of rice. I contributed to the bioinformatics component by using computational tools to identify and compare the coding regions of the newly assembled sequences.
https://www.pnas.org/doi/full/10.1073/pnas.1112567108
The aim of this project was to barcode each nucleotide in the human genome based on its genomic context. Our findings revealed that a genomic context of 50 nucleotides is sufficient to uniquely identify 92% of the nucleotides in the human genome. My role in this project involved contributing to discussions that determined the direction of our research.