Linear compartmental models describe the flow of material between interacting compartments and are widely used in biological and biomedical modeling. A central question is structural identifiability: can model parameters be uniquely recovered from input–output data? Using tools from algebraic geometry, graph theory, and computer algebra, identifiability is analyzed through the model’s input–output equations. Recent results for tree-structured models show that combinatorial properties of the underlying graph govern identifiability and yield explicit descriptions of identifiable parameter combinations. Extensions to multi-input/multi-output systems and to models that are not strongly connected reveal new algebraic and graphical phenomena. These developments highlight how graph structure controls both identifiability and algebraic complexity, and they contribute to ongoing efforts to classify identifiable linear compartmental models arising in biological applications.
We present the recent developments of Oscar.jl in the area of algebraic statistics.
More specifically we discuss the design of phylogenetic models highlighting the extensibility and integration with other types of statistics models.
Finally we present how to readily access data from algebraicphylogenetics.org and work with it in Oscar.
This is joint work with Tobias Boege, Marina Garrote-López and Benjamin Hollering.
This talk focuses on a recent application of formal verification methods to the field of systems biology. In some recent work, Bahrami, Zucchini, De Maria and Felty (arXiv:2505.05362v1) developed a formal model of neuronal circuits within the Rocq Proof Assistant. Using formal methods to study such biological systems allows for a deeper understanding of their behaviour and makes it possible to prove results that would be extremely difficult to establish experimentally. Moreover, using a proof assistant (such as Rocq) to formalize the model allowed the researchers to state properties in a precise way and to guarantee the correctness of the proofs of these properties.
In this talk, we will start by summarizing previous work on this model, which involved creating formal models of neurons and circuits, defining biologically motivated circuit archetypes, and proving results about these archetypes. Then, we’ll discuss our more recent work, which involves the composition of circuits. Studying circuit composition lets us understand how larger circuits can be built out of smaller ones, and how properties of the larger circuits can be inferred from their component parts. We’ll explore various types of circuit composition and see how these can be applied to various circuit archetypes.
In chemical reaction network theory, a common area of study is the analysis of steady state solutions. Steady states can be described by a variety in high dimensional parameter space, which is partitioned into connected components that give different numbers of steady state solutions. Computing this discriminant can be computationally difficult as our chemical reaction networks increase in complexity, but numerical methods allow us to analyze the connected components of the complement of the discriminant without computing the defining equation.
In population dynamics, the Allee effect refers to the phenomenon where a population has a higher growth rate at higher densities. It has been thoroughly studied and until recently the number of steady states was not well understood. I will present a case study of how numerical methods can be used to describe the number of positive steady states of the Allee effect.
[1] Breiding, Paul, John Cobb, Aviva K. Englander, Nayda Farnsworth, Jonathan D. Hauenstein, Oskar Henriksson, David K. Johnson, Jordy Lopez Garcia, and Deepak Mundayur. "Elimination Without Eliminating: Computing Complements of Real Hypersurfaces Using Pseudo-Witness Sets." arXiv preprint arXiv:2601.04383 (2026).
[2] Englander, Aviva K. and Jose Israel Rodriguez. "Towards Learning the Positive Real Discriminant of the Wnt Signaling Pathway Shuttle Model." ACM Communications in Computer Algebra 58, no. 3 (2025): 85-88.
[3] Song, Kuo, and Xiaoxian Tang. "Steady State Classification of Allee Effect System." arXiv preprint arXiv:2501.19062 (2025).
Hybridization and introgression occur most commonly between closely-related sister taxa. Yet this situation---corresponding to the presence of a 3-cycle in a phylogenetic network---is precisely the setting that poses the greatest challenge to current inference methodologies. The challenge is twofold: (1) detecting the presence of the 3-cycle, and (2)
determining its direction of gene flow, with the latter challenge especially problematic for methods based on phylogenetic invariants. In this talk, I will present recent work using polynomial inequalities to fully characterize how and when problems (1) and (2) can be solved under the Jukes-Cantor substitution model. We demonstrate the existence of a critical threshold such that if a hybridization is sufficiently ancient, the direction of gene flow cannot be distinguished even with infinite data.
This talk is based on joint work with Bryan Currie, Aviva Englander, Jose A. Esparza-Lozano, Elizabeth Gross, Colby Long, Devon Olds, Kawika O'Connor, Udani Ranasinghe, and Christin Sum.
Phylogenomics attempts to infer evolutionary relationships between taxa, depicted as a tree or a network, from sequence data for many genes. The Multispecies Coalescent Model describing how gene trees arise from the species network, provides a framework for studying what network features can be inferred from summaries of unrooted topological gene trees.
Identifiability of level-1 network features from quartet concordance factors has been well-studied, giving a theoretical basis for practical inference methods, despite the non-identifiability of network roots and 2-cycles from them. Since quintet concordance factors identify the root location of a species tree by Allman2011 we began a preliminary investigation of quintet CFs in a network setting. We give several distinguishability results, indicating that quintet CFs indeed contain information on 2-cycles and network rooting.
Classic results on the computational power of neural networks modeled neurons as perceptrons, ‘point neurons’ that sum inputs linearly. However, real neurons can exploit nonlinearities at synapses or dendrites. In recent work, Song and Benna (2025) explored a model with parallel synapses, multiple contacts between a pre- and postsynaptic neuron, each applying a sigmoidal transformation with learnable parameters. Under these assumptions, they observed that the effective transmission function from pre- to postsynaptic neuron is constrained to be monotonic, but otherwise flexible. They found in simulations that such neurons achieve a higher classification capacity than perceptrons: the critical capacity – the threshold ratio between number of input patterns to input dimension where a random labeling is at least 50\% likely to be separable – appears to grow with the logarithm of the input dimension. For a perceptron, the critical capacity is constant. We extend this work with a geometric characterization of separable and inseparable patterns, giving an analogue to the “XOR problem” for this model. Using linear programming duality together with some combinatorial results, we reduce the classification capacity problem for this neuron model to a standard perceptron capacity problem with correlated input. We then find a logarithmic lower bound on capacity by solving a reduced version of this problem with techniques from statistical mechanics.
A parameter of a mathematical model is structurally identifiable if it can be determined from noiseless experimental data. We examine the identifiability properties of two important classes of linear compartmental models: directed-cycle models and catenary models (models for which the underlying graph is a directed cycle or a bidirected path, respectively). Our main result is a complete characterization of the directed-cycle models for which every parameter is (generically locally) identifiable. Additionally, for catenary models, we give a formula for their input-output equations. Such equations are used to analyze identifiability, so we expect our formula to support future analyses into the identifiability of catenary models. Our proofs rely on prior results on input-output equations, and we also use techniques from linear algebra and graph theory. This is joint work with Ahmed, Crepeau, Dessauer Jr., Edozie, Garcia-Lopez, Grimsley, Neri and Shiu.
Graphical discrete Lyapunov models with non-Gaussian errors, as introduced in the recent work of Recke, Lumpp, Kushnerchuk, Oldekop, Li, Coons, and Robeva, arise as algebraic models of cumulants, up to a fixed order, of the stationary distributions of stable vector autoregressive processes associated with an underlying causal graph. They provide a framework for studying equilibrium causal structure from snapshot data, a task that frequently arises in biological applications. In this talk, I will discuss ongoing work towards causal discovery for these models, such as using polynomial constraints from the models' vanishing ideals. I will first describe progress on characterizing the equivalence classes of graphs that define the same model ideal, which determines the limits of any constraint-based causal discovery method. I will then outline possible algorithms for learning the underlying graph's equivalence class from observed cumulants for certain classes of graphs.
We revisit a classic causal inference problem—computing symbolic bounds on causal effects that are only partially identifiable—gently introducing it to an audience in computational algebra. The problem is typically introduced as a linear program and subsequently solved by vertex enumeration. Subject to certain assumptions on the underlying causal model, this leads to provably valid and sharp bounds, however this approach quickly becomes computationally intractable as the complexity of the underlying causal model increases. We ask the community: what computational and algebraic tools could lead to methods that are more efficient or that require less restrictive assumptions?
This is joint work with Erin Gabriel and Michael Sachs.
Phylogenetic trees are graphs that describe the evolutionary history between a group of species. With the growing awareness of biological phenomena such as gene transfer and introgression, in many cases a tree structure does not describe evolutionary accurately, and a network is used instead. We can study evolution at the molecular level (e.g. DNA) by placing a Markov model on a phylogenetic tree or network. By viewing the observable part of the model as an algebraic variety, we can use computational algebraic geometry to better understand these models.
In this talk I will introduce phylogenetic network models and give recent identifiability results on a class of phylogenetic networks called level-1. I will describe a novel method of inferring phylogenetic network topologies from DNA sequence data using functions call algebraic invariants (elements of the ideal corresponding to the model) and present results of this method on simulated data and real data.
Chemical reaction networks (CRNs) provide a natural model for analog computation in which inputs and outputs are encoded by molecular abundances. Classical studies of CRN-based computation have mainly focused on deterministic mass-action systems. Yet in biomolecular implementations using DNA and protein-based circuits, low copy-number effects and intrinsic fluctuations can be non-negligible, so stochastic models are often more faithful than concentration-based deterministic models. In this research, we develop a framework for computation in stochastic CRNs at the level of the mean of stationary distributions. Specifically, we construct elementary arithmetic modules such as identification, addition, multiplication, subtraction, and division, and analyze their ergodicity and mixing times. We then study how these modules can be interconnected to form composite circuits and examine how their computational behavior interacts. Our results provide a systematic framework for computation in stochastic chemical reaction networks.
Parameter estimation is a key problem in analyzing biological models. These estimations are based on measuring output data of the model. However, measurements themselves can be prohibitively expensive, creating a need to minimize the number of measurements while maintaining high estimation accuracy. It is a challenging problem to determine best measurement time points based on the user’s constraints. In the talk, we will discuss how this problem can be approached.
Independent Component Analysis (ICA) is a classical method for recovering latent variables with useful identifiability properties. However, full independence is a strong assumption that may not hold in many real-world settings. In this talk, I will discuss how much we can relax the independence assumption without losing identifiability of the model. We show that the weakest such assumption is pairwise mean independence. Our identifiability result is based on a generalization of the spectral theorem from matrices to higher-order tensors, which implies a unique tensor decomposition of the cumulant tensors arising in the model. This is joint work with Anna Seigal and Piotr Zwiernik.
Caspases are a family of enzymes that play a central role in the regulation of apoptosis (programmed cell death). We analyze a model of caspase activation introduced by Eissing et al. in 2004. This model, which is a mass-action model with 18 reactions, has the capacity for two stable steady states: a “life” steady states and an “apoptotic” (death) steady state. We show that the stability of the life steady state is characterized by the vanishing of a single determinantal polynomial. We also present results on the uniqueness and stability of the apoptotic steady state. Finally, we find that this model exhibits unexpected linear relations that hold at steady state; and we show how to predict such linear relations for general mass-action models. Along the way, we underscore the key role that computer algebra plays in this project – and also highlight some unresolved computational challenges.
This is joint work with Alicia Dickenstein and Mercedes Pérez Millán
High-dimensional transcriptomic data, such as bulk or single-cell RNA-sequencing, can be modelled as samples from a multivariate distribution. In this setting, the covariance matrix Σ encodes a finite collection of biological signals, including cell-type signatures, developmental programs, and stress responses. When the number of variables p and observations n are of the same order (p/n → γ > 0), recovering the population spectrum by classical techniques is untenable due to the different behaviour of so-called large-dimensional asymptotics and classical statistics. This presentation demonstrates how Free Probability Theory, the non-commutative analog of probability, provides a rigorous algebraic framework for recovering discrete population spectra from empirical observations. We frame the recovery of spiked eigenvalues as a problem of free additive and multiplicative deconvolution. By utilizing tools like the Cauchy and R-transforms, we transform complex spectral convolutions into linear algebraic operations. We present a certifiable computational procedure that uses these transforms to "peel away" the Marchenko-Pastur bulk, revealing the underlying discrete spectrum of Σ.
This algebraic reformulation is not merely a computational convenience, but rather it reflects the genuine algebraic nature of the limiting theory, and it opens the door to tools from computer algebra for the analysis and solution of the recovery of Σ.
The field of tropical statistics - motivated by the identification of the tropical Grassmannian and the space of phylogenetic trees - has produced a range of unconstrained optimisation problems over the tropical projective torus. We will review the types of convexity exhibited by tropical loss functions in statistics, and we propose a new gradient descent method for solving tropical optimisation problems. Theoretical results establish global solvability for tropically star-quasi-convex problems, and numerical experiments demonstrate the method's superior performance over classical descent for tropical optimisation problems which exhibit tropical quasi-convexity but not classical convexity. Notably, tropical gradient descent seamlessly integrates into advanced optimisation methods, such as Adam, offering improved overall performance.
High-throughput omics technologies generate complex, multi-dimensional datasets that capture genes, experimental conditions, and biological contexts across different samples. Traditional matrix-based analysis methods often flatten this structure, obscuring higher-order patterns such as gene–gene interactions across conditions or multi-omics correlations. Thus, novel computational approaches are overdue. Tensor representations provide a natural framework for preserving these multi-way relationships while enabling interpretable analysis.
We will discuss tensor decomposition methods to analyze high-dimensional omics datasets, extracting low-rank latent factors that capture key patterns across samples, genes, conditions, and omics platforms. Tensor-based modeling efficiently captures multi-way patterns while maintaining interpretability, revealing biologically meaningful higher-order dependencies. Applications include multi-sample gene expression analysis, multi-omics integration, and modeling of higher-order cell–cell interactions.
In our talk, we will also cover key methodological challenges, including rank selection, model identifiability, robustness to noise, and scalability; we will discuss how addressing these challenges opens new directions for rigorous analysis of complex biological systems, ultimately highlighting the potential of multi-dimensional modeling to advance computational biology.
We extend reconstruction methods for phylogenetic trees to ultrametrics of arbitrary matroids and study the stability of these data analysis methods in the combinatorial spirit of Andreas Dress.
In particular, we generalize Atteson's work on the safety radius of phylogenetic reconstruction methods, as well as Gascuel and Steel's work on the stochastic safety radius, to arbitrary matroids.
We also show that although the tropical Fermat-Weber points of an M-ultrametric sample are generally not contained in the space of M-ultrametrics, the intersection between the Fermat-Weber set and the space of M-ultrametrics is non-empty.
This is joint work with S. Cox, J. Sabol and R. Talbut.