Linear compartmental models describe the flow of material between interacting compartments and are widely used in biological and biomedical modeling. A central question is structural identifiability: can model parameters be uniquely recovered from input–output data? Using tools from algebraic geometry, graph theory, and computer algebra, identifiability is analyzed through the model’s input–output equations. Recent results for tree-structured models show that combinatorial properties of the underlying graph govern identifiability and yield explicit descriptions of identifiable parameter combinations. Extensions to multi-input/multi-output systems and to models that are not strongly connected reveal new algebraic and graphical phenomena. These developments highlight how graph structure controls both identifiability and algebraic complexity, and they contribute to ongoing efforts to classify identifiable linear compartmental models arising in biological applications.
Hybridization and introgression occur most commonly between closely-related sister taxa. Yet this situation---corresponding to the presence of a 3-cycle in a phylogenetic network---is precisely the setting that poses the greatest challenge to current inference methodologies. The challenge is twofold: (1) detecting the presence of the 3-cycle, and (2)
determining its direction of gene flow, with the latter challenge especially problematic for methods based on phylogenetic invariants. In this talk, I will present recent work using polynomial inequalities to fully characterize how and when problems (1) and (2) can be solved under the Jukes-Cantor substitution model. We demonstrate the existence of a critical threshold such that if a hybridization is sufficiently ancient, the direction of gene flow cannot be distinguished even with infinite data.
This talk is based on joint work with Bryan Currie, Aviva Englander, Jose A. Esparza-Lozano, Elizabeth Gross, Colby Long, Devon Olds, Kawika O'Connor, Udani Ranasinghe, and Christin Sum.
This talk focuses on a recent application of formal verification methods to the field of systems biology. In some recent work, Bahrami, Zucchini, De Maria and Felty (arXiv:2505.05362v1) developed a formal model of neuronal circuits within the Rocq Proof Assistant. Using formal methods to study such biological systems allows for a deeper understanding of their behaviour and makes it possible to prove results that would be extremely difficult to establish experimentally. Moreover, using a proof assistant (such as Rocq) to formalize the model allowed the researchers to state properties in a precise way and to guarantee the correctness of the proofs of these properties.
In this talk, we will start by summarizing previous work on this model, which involved creating formal models of neurons and circuits, defining biologically motivated circuit archetypes, and proving results about these archetypes. Then, we’ll discuss our more recent work, which involves the composition of circuits. Studying circuit composition lets us understand how larger circuits can be built out of smaller ones, and how properties of the larger circuits can be inferred from their component parts. We’ll explore various types of circuit composition and see how these can be applied to various circuit archetypes.
Phylogenomics attempts to infer evolutionary relationships between taxa, depicted as a tree or a network, from sequence data for many genes. The Multispecies Coalescent Model describing how gene trees arise from the species network, provides a framework for studying what network features can be inferred from summaries of unrooted topological gene trees.
Identifiability of level-1 network features from quartet concordance factors has been well-studied, giving a theoretical basis for practical inference methods, despite the non-identifiability of network roots and 2-cycles from them. Since quintet concordance factors identify the root location of a species tree by Allman2011 we began a preliminary investigation of quintet CFs in a network setting. We give several distinguishability results, indicating that quintet CFs indeed contain information on 2-cycles and network rooting.
A parameter of a mathematical model is structurally identifiable if it can be determined from noiseless experimental data. We examine the identifiability properties of two important classes of linear compartmental models: directed-cycle models and catenary models (models for which the underlying graph is a directed cycle or a bidirected path, respectively). Our main result is a complete characterization of the directed-cycle models for which every parameter is (generically locally) identifiable. Additionally, for catenary models, we give a formula for their input-output equations. Such equations are used to analyze identifiability, so we expect our formula to support future analyses into the identifiability of catenary models. Our proofs rely on prior results on input-output equations, and we also use techniques from linear algebra and graph theory. This is joint work with Ahmed, Crepeau, Dessauer Jr., Edozie, Garcia-Lopez, Grimsley, Neri and Shiu.
Graphical discrete Lyapunov models with non-Gaussian errors, as introduced in the recent work of Recke, Lumpp, Kushnerchuk, Oldekop, Li, Coons, and Robeva, arise as algebraic models of cumulants, up to a fixed order, of the stationary distributions of stable vector autoregressive processes associated with an underlying causal graph. They provide a framework for studying equilibrium causal structure from snapshot data, a task that frequently arises in biological applications. In this talk, I will discuss ongoing work towards causal discovery for these models, such as using polynomial constraints from the models' vanishing ideals. I will first describe progress on characterizing the equivalence classes of graphs that define the same model ideal, which determines the limits of any constraint-based causal discovery method. I will then outline possible algorithms for learning the underlying graph's equivalence class from observed cumulants for certain classes of graphs.
We revisit a classic causal inference problem—computing symbolic bounds on causal effects that are only partially identifiable—gently introducing it to an audience in computational algebra. The problem is typically introduced as a linear program and subsequently solved by vertex enumeration. Subject to certain assumptions on the underlying causal model, this leads to provably valid and sharp bounds, however this approach quickly becomes computationally intractable as the complexity of the underlying causal model increases. We ask the community: what computational and algebraic tools could lead to methods that are more efficient or that require less restrictive assumptions?
This is joint work with Erin Gabriel and Michael Sachs.
Phylogenetic trees are graphs that describe the evolutionary history between a group of species. With the growing awareness of biological phenomena such as gene transfer and introgression, in many cases a tree structure does not describe evolutionary accurately, and a network is used instead. We can study evolution at the molecular level (e.g. DNA) by placing a Markov model on a phylogenetic tree or network. By viewing the observable part of the model as an algebraic variety, we can use computational algebraic geometry to better understand these models.
In this talk I will introduce phylogenetic network models and give recent identifiability results on a class of phylogenetic networks called level-1. I will describe a novel method of inferring phylogenetic network topologies from DNA sequence data using functions call algebraic invariants (elements of the ideal corresponding to the model) and present results of this method on simulated data and real data.
Independent Component Analysis (ICA) is a classical method for recovering latent variables with useful identifiability properties. However, full independence is a strong assumption that may not hold in many real-world settings. In this talk, I will discuss how much we can relax the independence assumption without losing identifiability of the model. We show that the weakest such assumption is pairwise mean independence. Our identifiability result is based on a generalization of the spectral theorem from matrices to higher-order tensors, which implies a unique tensor decomposition of the cumulant tensors arising in the model. This is joint work with Anna Seigal and Piotr Zwiernik.
Caspases are a family of enzymes that play a central role in the regulation of apoptosis (programmed cell death). We analyze a model of caspase activation introduced by Eissing et al. in 2004. This model, which is a mass-action model with 18 reactions, has the capacity for two stable steady states: a “life” steady states and an “apoptotic” (death) steady state. We show that the stability of the life steady state is characterized by the vanishing of a single determinantal polynomial. We also present results on the uniqueness and stability of the apoptotic steady state. Finally, we find that this model exhibits unexpected linear relations that hold at steady state; and we show how to predict such linear relations for general mass-action models. Along the way, we underscore the key role that computer algebra plays in this project – and also highlight some unresolved computational challenges.
This is joint work with Alicia Dickenstein and Mercedes Pérez Millán
High-dimensional transcriptomic data, such as bulk or single-cell RNA-sequencing, can be modelled as samples from a multivariate distribution. In this setting, the covariance matrix Σ encodes a finite collection of biological signals, including cell-type signatures, developmental programs, and stress responses. When the number of variables p and observations n are of the same order (p/n → γ > 0), recovering the population spectrum by classical techniques is untenable due to the different behaviour of so-called large-dimensional asymptotics and classical statistics. This presentation demonstrates how Free Probability Theory, the non-commutative analog of probability, provides a rigorous algebraic framework for recovering discrete population spectra from empirical observations. We frame the recovery of spiked eigenvalues as a problem of free additive and multiplicative deconvolution. By utilizing tools like the Cauchy and R-transforms, we transform complex spectral convolutions into linear algebraic operations. We present a certifiable computational procedure that uses these transforms to "peel away" the Marchenko-Pastur bulk, revealing the underlying discrete spectrum of Σ.
This algebraic reformulation is not merely a computational convenience, but rather it reflects the genuine algebraic nature of the limiting theory, and it opens the door to tools from computer algebra for the analysis and solution of the recovery of Σ.
We present the recent developments of Oscar.jl in the area of algebraic statistics.
More specifically we discuss the design of phylogenetic models highlighting the extensibility and integration with other types of statistics models.
Finally we present how to readily access data from algebraicphylogenetics.org and work with it in Oscar.
This is joint work with Tobias Boege, Marina Garrote-López and Benjamin Hollering.
We extend reconstruction methods for phylogenetic trees to ultrametrics of arbitrary matroids and study the stability of these data analysis methods in the combinatorial spirit of Andreas Dress.
In particular, we generalize Atteson's work on the safety radius of phylogenetic reconstruction methods, as well as Gascuel and Steel's work on the stochastic safety radius, to arbitrary matroids.
We also show that although the tropical Fermat-Weber points of an M-ultrametric sample are generally not contained in the space of M-ultrametrics, the intersection between the Fermat-Weber set and the space of M-ultrametrics is non-empty.
This is joint work with S. Box, J. Sabol and R. Talbut.