My current research interests lie in the application of stochastic processes in modeling the evolution of species. In particular, I am interested in understanding the various factors that drive speciation and extinction events on species. Recently, I become deeply fascinated in how environmental and geographical changes affect species evolution.
During my undergraduate years, I studied various subject in Mathematics and Statistics, ranging from mathematical analysis, applied mathematics, to statistical methods. My Bachelor's thesis was under the supervision of Ariyanto, M.Si. and (alm) Dr. Jafaruddin Hamid with a research in point-set topology. We studied various characteristics and relationships from the separation axioms of topological spaces. Both my studies and research were supported by the Academic Achievement scholarships from the Indonesian Ministry of Research, Technology, and Higher Education, and the Van Deventer-Maas scholarship from the Van-Deventer-Maas Indonesia.
During my postgraduate studies as a Master's student at the Australian National University, I took several courses in Bioinformatics and Mathematical Population Genetics. I wrote a Master's thesis under the guidance of Assoc. Prof. Conrad Burden. For the thesis, we investigated the elapsed time since the most recent common ancestor of a finite random sample drawn from a present-day population which has evolved according to a Bienaymé-Galton-Watson branching process. Both research and studies for the degree were supported by the Indonesian Ministry of Finance under the LPDP scholarship.
I began working with mathematical models for the evolution of species in 2019 as part of my PhD degree under Prof. Barbara Holland and Assoc. Prof. Małgorzata O'Reilly. In my PhD, I developed mathematical models for species evolution. Specifically, we developed a model where events on species trees depend on their species age. Also, we developed a trait-dependent model where speciation and extinction events are driven by species' inherited traits. Last but not least, we developed an environmental-dependent model where geography and change in environments drive species evolution. In summary, my PhD research combine stochastic modeling to study evolutionary processes.
RECORDED TALKS
Evolution Meetings 2025 (Virtual) "Inferring epidemiological parameters under infectious phylogeography model with visitor dynamics" (May 29th, 2025)
3rd Joint Congress on Evolutionary Biology (Virtual) "A diffusion-based approach for simulating forward-in-time state-dependent speciation and extinction dynamics" (June 27th, 2024)
Living Earth Collaborative "Mathematical perspectives on phylogenetic models of lineage diversification" (March 7th, 2024)
11th International Conference on Matrix-Analytic Method "Stochastic niche-based models for the evolution of species" (June 29 - July 1, 2022)
PRESENTATION SLIDES
"From phylogenies to processes: mathematical and statistical models of macroevolution"
AUTECology Exchange Talk - Indonesia's National Research and Innovation Agency (BRIN)
"How shifts in regional species diversity shape evolutionary rates through time"
Ecology & Evolutionary Biology Seminar Series - Washington University in St. Louis
"Mathematics of evolution and infectious disease dynamics"
Public Seminar - Math Department, Nusa Cendana University - Indonesia
"Phylogenetic estimation of diversity-dependent biogeographic rates using deep learning"
Phylomania 2025 - Hobart, Australia
INFERRING EPIDEMIOLOGICAL PARAMETERS USING PHYLOGEOGRAPHY MODEL OF INFECTIOUS DISEASE TRANSMISSION WITH HOST MOVEMENTS IN SPATIALLY STRUCTURED POPULATIONS
During an outbreak, infectious disease can spread among populations through host movement, potentially fueling local outbreaks with their own epidemiological dynamics. However, it is difficult to know how often infections between populations are transmitted by diseased travelers infecting healthy residents when abroad, rather than by diseased residents infecting healthy travelers, who later return home with the new pathogen. In this paper, we introduce a phylogeographic model where pathogens spread through visitor dynamics, whereby hosts visit other populations through short trips before returning home. To do so, we used the stationary properties of an epidemiological compartment model with visitor dynamics to construct an approximation that is statistically accurate and computationally tractable for phylogenetic modeling. In addition, we derive novel mathematical properties under the approximating model that provide a sufficient condition under which the approximation remains accurate. We applied our model to empirical infection data and travel statistics from the European SARS-CoV-2 pandemic. Inference under our model suggests that, in the early stages of the outbreak, SARS CoV-2 was more often ``pulled'' into the home countries of returning travelers than ``pushed'' into foreign countries by visitors from abroad. Estimates of host movement–related parameter values under our visitor model suggest that competing migration models, with trips of indefinite length, may underestimate the magnitude of outbreaks caused by visitors. This study emphasizes the importance of carefully incorporating host movement dynamics into such models.
LINKING DATA GENERATING PROCESS AND EVOLUTIONARY PATTERNS UNDER STATE-DEPENDENT SPECIATION AND EXTINCTION MODELS
State-dependent speciation and extinction (SSE) models are stochastic branching processes with state-dependent birth (speciation) and death (extinction) rates. The states can either be discrete or continuous, and can represent various things from phenotypic traits to geographical ranges. By modelling anagenesis and cladogenesis explicitly, these models help researchers to understand their data by asking whether traits play a role in shaping species diversity through time. However, over the past years, these models entered a phase of intense but overdue scrutiny to better understand what the models can and cannot estimate reliable when fitted to real biological dataset. This works helps to fill in the demand of a more rigorous study on mathematical properties underlying these complex stochastic processes. Here, we establish a general diffusion-based framework (often used in the field of population genetics) to study this wide and highly influential class of phylogenetic diversification models. We derive explicitly relationships between the data-generating process and their expected evolutionary patterns through direct comparisons (via simulation and theory) with a tree-based approach. Our work helps to formalize relationships between evolutionary state patterns, process rates, and mixing times for ClaSSE-type models (SMB article highlight).
DIFFERENT EVOLUTIONARY MECHANISMS COULD LEAD TO EQUIVALENT LONG-TERM PATTERN OF SPECIES DIVERSITY
In phylogenetics we are interested to know the underlying diversification process that undermines the evolutionary history of present-day species diversity. Using a phylogeny of sampled present-day species, along wih character states (e.g., phenotypic traits) and branch length information, and by assuming our group of interest evolves following a certain diversification model (e.g., state-dependent speciation and extinction), we can use statistical tool, such as Bayesian phylogenetic and maximum-likelihood, to infer diversification rate parameters, such as speciation and extinction, and transition rates under the model. The values of these rate parameters inform us on whether a certain character state leads to faster/slower species diversification than the other states. Here we are thinking backward-in-time (Fig. 1). By that, I mean we use information from the present to inform about past history. Equivalently, we can also think of it as a forward-in-time process. Many of these diversification models are generative, meaning they can be used to simulate phylogenies given a set of rate parameter values. As a result, different evolutionary scenarios could result in different tree topologies and different proportion of species in each character state. Empirical trees are trees that are conditioned on a particular time (i.e., present). If we allow this tree to evolve past the present, we can derive long-term distribution of species proportion in each state (Fig. 2). Here, I argue that there are multiple alternating evolutionary scenarios under SSE framework that lead to the same long-term distribution of species frequency across different character states. For example,
Speciation_rate_A (& B) = 0.65 (0.05)
Extinction_rate_A (& B) = 0.2 (0.1)
Transition_rate_into_A (&B) = 0.05 (0.9)
or
Speciation_rate_A (& B) = 0.1 (0.3)
Extinction_rate_A (& B) = 1.0 (0.2)
Transition_rate_into_A (&B) = 0.2 (0.9)
Both options lead to 10% of species having state A and 90% of species having state B. In theory, there exists multiple independent and distinct evolutionary scenarios for each stationary class (e.g., \Pi_A < \Pi_B, \Pi_A > \Pi_B and \Pi_A = \Pi_B) (See Fig. 3).
Moreover, I show not all of these scenarios are equally represented in phylogenies if we are thinking of simulating trees forward-in-time. Lastly, some of these scenarios are easier to get detected using only information from present-day species.
Fig. 1: Diversification analysis as a backward-in-time process
Fig. 2: Birth-death diversification model as a forward-in-time process
Fig. 3: Existence of alternative scenarios in each stationary class from an SSE model
PHYLOGENETIC DEEP LEARNING FOR STUDYING BIOGEOGRAPHIC DIVERSIFICATION WITH SPECIES INTERACTIONS
In recent years, as models get more complex to accommodate biological realism, traditional inference methods via maximum-likelihood or Bayesian frameworks become obsolete. These methods require an explicit likelihood function of observing a tree for their inference studies. However, deriving a likelihood function is not always trivial - this is particularly true for some more complex models. In fact, current SSE models that have explicit likelihood functions require numerical approximations to compute their likelihood. Supervised learning offers a way out of this complicated problem by learning directly from patterns derived from data generated under a simulation model. In phylogenetics, recent studies have shown that phylogenetic inference under supervised learning agrees with inference under the traditional inference methods (paper).
In this work, we introduce a fully generative, event-based phylogenetic diversification model, called DDGeoSSE, that allows diversity-dependent effects of local species richness to modulate biogeographic rates of diversification and range evolution. DDGeoSSE can accommodate and test a variety of alternative diversification scenarios that involve positive, negative, and neutral interactions among sympatric species for speciation, extinction, and dispersal. We derive mathematical and statistical properties of biogeographic outcomes generated by this model, such as the carrying capacity for a clade at equilibrium, which we validate through simulation. Because diversity-dependent phylogenetic models typically do not have tractable likelihood functions, we use deep learning with phyddle to perform parameter inference and model selection. Separately applying DDGeoSSE to Caribbean Anolis lizards and cloud forest-dwelling Viburnum plants, we find evidence that local species richness plays a significant role in shaping diversification dynamics for both clades.
MATRIX-ANALYTIC METHODS FOR MODELLING LINEAGE DIVERSIFICATION WITH STATE-DEPENDENT RATES
This work provides a bridge between phylogenetic models for lineage diversification and an area of stochastic modelling known as matrix-analytic methods (MAMs). We utilized an important class of MAMs known in literature as Markovian Binary Tree to derive various mathematical properties from trees that follow state-dependent evolutionary processes. Our work has a nice result that shows that most SSE models are special cases of an MBT. We also develop an explicit mathematical expression to compute likelihood of a reconstructed tree evolving under an SSE model in MBT notations. This likelihood function allows researchers to do parameter inference under the model given biological (or simulated) data. Using MBT, we can compute tree balance metric of a phylogenetic tree directly from the model parameters without using any statistical analyses that people would normally do (preprint).
AGE-DEPENDENT PHYLOGENETIC MODELS OF LINEAGE DIVERSIFICATION
Phylogenetic models of lineage diversification follow a birth-and-death process in which a birth event is associated with a speciation event (lineage splits into two distinct lineages) and a death event is associated with an extinction event (lineage terminates). We model these birth and death events by drawing waiting times from some probability distributions (e.g. exponential distribution).
PH distributions are a class of distributions in the theory of MAMs that generalize both Erlang and hyperexponential distributions. Under this distribution, we can think of an individual species as progressing through different phases during its lifetime with some rates until it either undergoes a speciation or extinction event. Thus, the distribution provides a natural path for linking these speciation and extinction events with species age. By adjusting its underlying parameters, we can test biological hypothesis of whether species age plays a role in increasing or decreasing diversification rates through time. Given an empirical data, we found that our model under a pure-birth process (assume no extinction) fits better compared to other models that follow usual distributions such as exponential and Weibull (journal article).
STUDY ON MOST RECENT COMMON ANCESTOR OF A RANDOMLY SAMPLED POPULATION UNDER A BIENAYMÉ - GALTON - WATSON BRANCHING PROCESS
We consider the problem of estimating the elapsed time since the most recent common ancestor of a finite random sample drawn from a population which has evolved through a Bienaymé–Galton– Watson branching process. More specifically, we are interested in the diffusion limit appropriate to a supercritical process in the near-critical limit evolving over a large number of time steps. Our approach differs from earlier analyses in that we assume the only known information is the mean and variance of the number of offspring per parent, the observed total population size at the time of sampling, and the size of the sample. We obtain a formula for the probability that a finite random sample of the population is descended from a single ancestor in the initial population, and derive a confidence interval for the initial population size in terms of the final population size and the time since initiating the process. We also determine a joint likelihood surface from which confidence regions can be determined for simultaneously estimating two parameters, (1) the population size at the time of the most recent common ancestor, and (2) the time elapsed since the existence of the most recent common ancestor (paper).