A phylogenetic tree serves as a visual representation of the evolutionary relationships among species, with each leaf denoting a current species and internal nodes representing ancestral ones. Recent strides in genetics and genomics have revolutionized the landscape, enabling rapid and cost-effective generation of genome data. This progress has catalyzed the emergence of phylogenomics, a new field that leverages phylogenetic tools to conduct comparative analyses on genomic data. Our focus is on developing statistical methodologies within the space of phylogenetic trees, which coincides with the ultrametric space and draws inspiration from {\em tropical geometry}. We introduce the tropical logistic regression model and its extension, the tropical neural network, designed to classify sampled gene trees into classes defined by a species tree. These methodologies serve as a convergence criterion for tree-MCMC within MrBayes, facilitating robust Bayesian inference of phylogeny. Moreover, we have proved that the principal component of the METAL and STEAC phylogenomic estimates is the invariant vector on the tropical projective torus.
A parametrized statistical model is globally rationally identifiable if there is a birational isomorphism between its coordinate ring and that of its parameter space. In statistics, it often happens that this isomorphism is induced by a birational isomorphism of the ambient affine spaces whose denominators are positive polynomials on the parameter space. Given such a map, a semialgebraic description of the parameter space can easily be converted to a semialgebraic description of the model. We develop this idea into a general framework. Our method automatically rediscovers well-known constraints such as Markov properties for directed and undirected graphical models, the model invariants of staged trees, and the Verma constraint. It transparently accommodates variants and extensions to these models, such as interventions and colored graphical models, and yields semialgebraic descriptions of the recently introduced Lyapunov models. Finally, we show how to derive a generating set of the model's vanishing ideal up to saturation and how this can speed up model equivalence testing. Since our framework is fairly general, this unifies results obtained independently in the discrete and the Gaussian setting, and proves, among other similar results, a generalization of a conjecture of Sullivant.
In a discrete graphical model, an underlying graph encodes conditional independences among a set of discrete random variables that label the vertices of the graph. Mixtures of these models allow us to incorporate the assumption that the population is split into subpopulations, each of which may follow a different distribution in the graphical model. From the perspective of algebraic statistics, a discrete graphical model is the real positive piece of a toric variety and the mixture model is part of one of its secant variety. In this talk, we use discrete geometry to investigate the dimensions of the second mixtures of decomposable discrete graphical models. We show that when the underlying graph is not a "clique star", the mixture model has the maximal dimension. This in turn allows us to prove results on the local identifiability of the parameters.
We consider toric maximum likelihood estimation over the field of Puiseux series and study critical points of the likelihood function using tropical methods. This problem translates to finding the intersection points of a tropical affine space with a classical linear subspace. We derive new structural results on tropical affine spaces and use these to give a complete and explicit description of the tropical critical points in certain cases. In these cases, we associate tropical critical points to the simplices in a regular triangulation of the polytope giving rise to the toric model.
Algebro-geometric methods have proven to be very successful in the study of graphical models in statistics. In this talk I will lay out the first steps to carry out a similar study of their quantum counterparts. These quantum graphical models are families of quantum states satisfying certain locality or correlation conditions encoded by a graph. I will present several ways to associate an algebraic variety to a quantum graphical model. The classical graphical models can be recovered from most of these varieties by restricting to quantum states represented by diagonal matrices. We study fundamental properties of these varieties and provide algorithms to compute their defining equations. Moreover, we study quantum information projections to quantum exponential families defined by graphs and prove a quantum analogue of Birch’s Theorem.
Determinantal point processes (DPPs) on a finite ground set are a family of statistical models whose state space is the power set of the ground set. The key feature of DPPs is negative correlation, meaning that DPPs select for diverse subsets of the ground set. We give an overview of the work that has been done on the likelihood geometry of DPPs, with a focus on projection DPPs and their connection to the squared Grassmannian ${\rm sGr}(2,n)$, which is the image of the Grassmannian ${\rm Gr}(2,n)$ in its Pl\"{u}cker embedding under the squaring map. Our main result is that the log-likelihood function of this statistical model has (n−1)!/2 critical points, all of which are real and positive.
A primary goal of phylogenetics is to understand the evolutionary history of a set of species. These histories are typically represented by directed graphs, where the leaves correspond to living species and the interior nodes represent extinct ones. While evolutionary histories are often assumed to take the form of trees, networks provide a more realistic representation in the presence of events such as hybridization. However, allowing the generality of networks significantly complicates the inference process. In this talk, we will explore the role that computational algebraic geometry and symbolic computation play in addressing statistical challenges related to network inference, with a particular focus on identifiability and the challenges in extending results about level-1 networks to level-2 networks.
A fundamental question in the field of molecular computation and synthetic biology is what computational tasks biochemical systems are capable of carrying out. In this talk, we will show that chemical reaction networks can do maximum likelihood estimation of log-affine models in the following sense: Given a basis for the kernel of the design matrix of a given model, we construct a network such that the MLE can be read off from the unique equilibrium when the initial concentrations are set to the observed distribution. We furthermore show that the choice of basis for the kernel of the design matrix has a large influence on the dynamical properties of the network, which leads to several interesting questions at the crossroads between statistics, dynamics and nonnegative toric geometry. In particular, we will discuss the special role that Markov bases play in ensuring well-behaved convergence properties of the network, with emphasis on the ML degree 1 case. This is joint work with Carlos Améndola, Jose Rodriguez, and Polly Yu.
We study uniqueness of size-2 positive semidefinitite (psd) factorizations using tools from rigidity theory. In a size-k psd factorization the rows or columns of factors are vectorizations of size-k psd matrices. Psd factorizations play an important role in applications, for example in the computational complexity of semidefinite programs and quantum information theory. The main goal of rigidity theory is to determine whether there exists a unique configuration of n points in R^d up to rigid transformations with a fixed partial set of pairwise distances between the points. We transfer ideas from rigidity theory to study uniqueness of psd factorizations and give a complete characterization of unique size-2 psd factorizations of positive matrices of rank three. This talk is based on joint work with Kristen Dawson, Serkan Hosten, and Lilja Metsälampi.
Stationary distributions of multivariate diffusion processes have recently been proposed as probabilistic models of causal systems in statistics and machine learning. Taking up this theme, I will present a characterization of the conditional independence relations that hold in a stationary distribution of a diffusion process with a sparsely structured drift. The result draws on a graphical representation of the drift structure and clarifies that marginal independencies are the only source of independence relations. Central to the proof is an algebraic analysis of Gaussian stationary distributions obtained from multivariate Ornstein-Uhlenbeck processes.
Graphical continuous Lyapunov models offer a novel framework for statistical modeling of correlated multivariate data. These models define the covariance matrix through a continuous Lyapunov equation, parameterized by the drift matrix of the underlying dynamic process. In this talk, I will discuss key results on the defining equations of these models and explore the challenge of structural identifiability. Specifically, I will present conditions under which models derived from different directed acyclic graphs (DAGs) are equivalent and provide a transformational characterization of such equivalences. This is based on ongoing work with Carlos Amendola, Tobias Boege and Ben Hollering.
Toric models and their ML degrees are an important branch of algebraic statistics. The association of such models with polytopes motivates the study of the following ML degree monotonicity. Given a polytope, we verify that the ML degree of its associated model is an upper bound on the ML degrees of the models defined by all its faces. To this end, we first study the well-known maximum likelihood estimation in the presence of data zeros. This is joint work with Carlos Améndola and Maximilian Wiesmann.
We study the problem of transforming a multi-way contingency table into an equivalent table with uniform margins and same dependence structure. This is an old question which relates to recent developments in copula modeling for discrete random vectors. Here, we focus on d-dimensional binary tables and show how the zero-patterns affect the transformation as well as its statistical interpretability in terms of dependence structure. We illustrate the theory through some examples and conclude with a discussion on the topic and future research directions. This is based on joint work with Roberto Fontana (Politecnico di Torino) and Fabio Rapallo (University of Genoa).
We will consider the classical algebraic statistics problem of constructing exact goodness-of-fit tests for discrete exponential family models. Perhaps surprisingly, classical problem remains practically unsolved for many types of structured or sparse data, as it rests on a computationally difficult core task: to produce a reliable sample from lattice points in a high-dimensional polytope.
This talk overviews two recent approaches to the problem, both seeking new ways to learn to sample the polytope. The first translates the problem into a Markov decision process and uses reinforcement learning for sampling, with provable convergence to an optimal sampling policy. The second is motivated by a conjecture of Henry Wynn about multilevel methods in algebraic statistics. Joint work with Ivan Gvozdanović and Nathan Kirk.
We study causality in systems that allow for feedback loops among the variables via models of cross-sectional data from a dynamical system. Specifically, we consider the set of distributions which appears as the steady-state distributions of a stochastic differential equation (SDE) where the drift matrix is parametrised by a directed graph. The nth order cumulant of the steady state distribution satisfies the corresponding nth order continuous Lyapunov equation. Under the assumption that the driving Lévy process of the SDE is not a Brownian motion (so the steady state distribution is non-Gaussian) and the coordinates are independent we are able to prove generic identifiability for any connected graph from the second and third order Lyapunov equations while allowing the cumulants of the driving process to be unknown diagonal.
The main task of causal discovery is to learn direct causal relationships among observed random variables. These relationships are usually depicted via a directed graph whose vertices are the variables of interest and whose edges represent direct causal effects. In this talk we will discuss the problem of learning such a directed graph for a linear causal model. I will specifically address the case where the graph may have hidden variables or directed cycles. In general, the causal graph cannot be learned uniquely from observational data. However, in the special case of linear non-Gaussian acyclic causal models, the directed graph can be found uniquely. When cycles are allowed the graph can be learned up to an equivalence class. We characterize the equivalence classes of such cyclic graphs and we propose algorithms for causal discovery. Our methods are based on using algebraic relationships among the second and higher order moments of the random vector. We show that such algebraic relationships are enough to identify the graph. I will conclude with an overview of some of our other projects in the field of causal discovery, specifically to cases of cyclic graphs and time-evolving variables.
Polynomial neural networks are implemented in a range of applications and present an advantageous framework for theoretical machine learning. In this talk, we introduce the notion of the activation degree threshold of a network architecture. This expresses when the dimension of a neurovariety achieves its theoretical maximum. We show that activation degree thresholds of polynomial neural networks exist and provide an upper bound, resolving a conjecture on the dimension of neurovarieties associated to networks with high activation degree. Along the way, we will see several illustrative examples. This is joint work with Bella Finkel, Chenxi Wu, and Thomas Yahl.
The introduction of conditional independence for multivariate extremes from threshold exceedances has inspired a new line of research in extremal dependence modeling. In this talk we summarize recent developments and try to highlight connections with related fields. In particular we discuss directed and undirected graphical models for multivariate extremes from threshold exceedances, as well as approaches for structure and parameter learning. For the parametric family of Hüsler--Reiss distributions, which can be considered as an analogue of the Gaussian in extremes, extremal conditional independence can be described parametrically. This enables a parametric encoding of extremal graphical models and gives rise to notions of extremal conditional independence ideals and extremal ML degrees.
When modeling causal systems with directed graphs, methods for recovering the causal graph face a natural issue: Without any additional modeling assumptions, the graph is generally unidentifiable from only observational data. Consequently, costly experiments are often needed to identify the causal system and build causally-informed predictive models. However, structural identifiability typically improves when additional constraints are learned, such as model parameter homogeneities or context-specific invariances. One can then search a space of submodels defined by a choice of these additional constraints, returning more exact estimates of the causal graph without the need for experimental data. We will exhibit these methods via a pair of causal discovery algorithms in two cases; namely, large-scale categorical data and linear Gaussian models. Both of these model types are commonplace in industry, while also being cases where structural identifiability remains a theoretical challenge. In juxtaposition to previous results, structural identifiability, as well as computational efficiency, for these models are closely tied to the combinatorial and algebraic geometry of the submodels of interest.
We discuss determinantal varieties for symmetric matrices that have zero blocks along the main diagonal. This project originated from recent developments in theoretical cosmology, so my talk also brings a dose of physics to the algebraic statistics audience.
In this talk, we establish connections between hypersurface arrangements and likelihood geometry. Thereby arises a new description of the prime ideal of the likelihood correspondence of a parametrised statistical model. The description rests on the Rees algebra of the likelihood module of the arrangement, a module that is closely related to the module of logarithmic derivations introduced by Saito for a general hypersurface. Our new description is often computationally advantageous. Moreover, in contrast to most previous work in algebraic statistics, our perspective considers statistical models parametrically, which is more natural for many statistical applications.
The log canonical threshold (lct) is a fundamental invariant in birational geometry, crucial for understanding the complexity of singularities in algebraic varieties. Its real counterpart, the real log canonical threshold (rlct) — also known as learning coefficient — has gained significance in statistics and machine learning, where it plays a key role in model selection and error estimation for singular statistical models. This talk presents new results on the rlct and its multiplicity for real (not necessarily reduced) hyperplane arrangements. We derive explicit formulas for these invariants based solely on the combinatorics and linear algebra of the arrangement.
Max-Linear Bayesian Networks have shown to be highly applicable in providing causal inference in extreme value data. We use the Latent Tree problem as inspiration to develop a statistical method for estimating the Kleene star parameters of a directed acyclic graph (DAG) based on observed data. We apply a similar recursive structural equation model to the one proposed by Gissibl and Kluppelberg. In particular, we focus on the challenges that arise in estimating parameters when nodes have multiple parents and when noise is present in the data. Observing the relationship between Max-Linear Bayesian Networks and Extreme Value Theory, we model this problem through the lens of risk analysis and account for provide cases of extreme behavior and Gaussian noise. Through theoretical derivations and numerical experiments, we show that our method produces estimates of the Kleene star parameters. We examine the graph’s path structure and apply the principles of tropical geometry, and EVT to derive consistent estimators for the parameters.
We study scattering equations of hyperplane arrangements from the perspective of combinatorial commutative algebra and numerical algebraic geometry. We formulate the problem as linear equations on a reciprocal linear space and develop a degeneration-based homotopy algorithm for solving them. We investigate the Hilbert regularity of the corresponding homogeneous ideal and apply our methods to CHY scattering equations.
We investigate maxout polytopes, which arise from neural networks with a maxout activation function. Given a monotone maxout neural network, the corresponding polytope is constructed by moving layer by layer through the network such that the polytope at layer i+1 is the convex hull of two non-negative Minkowski combinations of the polytopes at layer i. Each fixed sequence of layer sizes determines a class of maxout polytopes. For example, the class of maxout polytopes coming from maxout networks with d-dimensional input and one hidden layer with n nodes is exactly the set of d-dimensional zonotopes with n generators. We study the combinatorial structures of maxout polytopes for small networks and relate them to other known polytopal constructions, such as neighborly cubical polytopes.
OSCAR is a relatively new computer algebra system built on top of several well established systems for mathematical research. We give an overview of the functionality OSCAR provides for working with graphical models. Specifically Gaussian and discrete graphical models and Markov models on trees. This is part of the new Algebraic Statistics section of OSCAR and is joint work with Benjamin Hollering, Marina Garrote López and Tobias Boege.
Graphical models for extremes are a recently introduced way of exploring the dependence structure in the extreme setting. A particularly interesting parametric model to study is the Hüsler–Reiss distribution.In this ongoing work we explore ways to characterize and better understand the theoretical properties of these models. We will use Cayley–Menger matrices to simplify working theoretically with such models, and we will discuss how the related resistance-based invariants can help characterize extremal conditional independence. Finally, we comment on other characterizations, such as using total correlation (or multiinformation), a tool from information theory. This is based on joint work with Frank Röttger.
Max-linear Bayesian networks are a type of structural equation model describing the behaviour of several variables and their large observed values. In particular, they admit a interpretable description as a weighted acyclic directed graph. Due to their max-linear nature, samples coming from MLBN can be characterised by spans of the maximal-flow matrix of the underlying graph, which are polytropes. Since the underlying graphs are acyclic, these matrices are even lower-triangular. Following this observation, we describe the tropical combinatorics of MLBNs in terms of the complete flags of linear spaces. This generalises previous work by Tran from 2017 that dealt with the enumeration of polytropes where the underlying graph is the complete graph.
Beta-stochastic blockmodels form a class of log-linear exponential random graph models that combines the undirected block model with the degree-based beta model. These statistical network models are useful in describing relational data that exhibit homophily, the tendency for certain individuals to group together. Individuals are represented by nodes in an undirected graph which are grouped into blocks based on shared characteristics. We consider the maximum likelihood degree for the beta-stochastic blockmodel culminating in a multiplicative formula.
Asymptotic goodness-of-fit methods in contingency table analysis can struggle with sparse data, especially in multi-way tables where it can be infeasible to meet sample size requirements for a robust application of distributional assumptions. However, algebraic statistics provides exact alternatives to these classical asymptotic methods that remain viable even with sparse data. We apply these methods to a context in psychometrics and education research that leads naturally to multi-way contingency tables: the analysis of differential item functioning (DIF). We explain concretely how to apply the exact methods of algebraic statistics to DIF analysis using the R package algstat, and we compare their performance to that of classical asymptotic methods.
Recently, Sturma, Drton, and Leung proposed a general stochastic method for a hypothesis test of any model defined by polynomial equality and inequality constraints. This method can be used even near irregular points, including singularities and boundaries that challenge traditional testing frameworks. While its asymptotic properties are well established, the practical performance of the method remains less explored. In this talk, we present an empirical evaluation of the method using a series of illustrative example models. Our results highlight the method's robustness and versatility but also uncover critical considerations for practitioners, including computational challenges, sensitivity to parameter choices, and practical limitations.
Neural networks are parameterized families of functions with an enormous range of applications in the sciences due to their versatility in approximating functions. The corresponding function space, also known as the neuromanifold, can be studied algebraically as it can be naturally described by polynomial equations and inequalities. Recent work in this area has provided remarkable insight into the geometry of these networks, when the activation function is linear or polynomial. Due to the inherent restrictions of polynomials in approximation theory, rational activation functions provide a significant improvement. In this talk we describe the neurovariety for several classes of rational neural networks, giving concrete generating sets for the corresponding ideals. Furthermore, we explain how the image of the neural network naturally relates to a set of tensors and we propose a novel algebraic technique to perform tensor decomposition. This talk is based on joint work with Elina Robeva and Maksym Zubkov.
One of the main problems in phylogenetics is to estimate which tree better describes the evolutionary history of some given DNA or protein data. A common approach to reconstruct the phylogenetic tree is to assume a substitution model that explains how characters are substituted at each site of the sequence according to biochemical properties. Classically the selection of a suitable evolutionary model is based on heuristics or relies on the choice of an approximate input tree. In the 90s, several authors suggested that certain linear equations satisfied by the expected probabilities of patterns observed at the leaves of the tree could be used for model selection. It remained an open question, however, whether these equations were sufficient. Using techniques from algebraic geometry and group theory, Casanellas et al. (2012) proved that mixtures of distributions on phylogenetic trees under an equivariant model for DNA form a linear space that fully characterizes the model under consideration. In Kedzierska et al. (2011), they successfully implemented a method for model selection using the linear equations that describe the space of phylogenetic mixtures, which outperformed approaches existing at the time. However, the models studied from the algebraic viewpoint up to now are either too general or too restrictive. If we intend for this model selection approach to be used in practice, a first step requires computing the linear equations of the space of phylogenetic mixtures beyond equivariant models (also for protein data). We provide a framework to study algebraic time-reversible (ATR) models and apply these techniques to the Tamura-Nei model (and its submodels) for DNA and simple ATR models for proteins. Based in joint work with Marta Casanellas, Angélica Torres, Annachiara Korchmaros, Jennifer Garbet, Danai Deligeorgaki, Gökçen Dilaver, Niharika Paul.
In the study of (possibly cyclic) directed graphical models, one aims to understand how the structure of a directed graph relates to the structure of an associated model. Many graphs yield the same model — a feature called distribution equivalence. In the acyclic case, equivalent graphs are related via a sequence of covered edge flips. We translate these edge flips into generators of a binomial ideal and apply the theory of binomial and toric ideals to study distribution equivalence of cyclic graphical models.
We study the maximum likelihood (ML) degree of discrete exponential independence models and models defined by the second hypersimplex. For models with two independent variables, we show that the ML degree is an invariant of a matroid associated to the model. We use this description to explore ML degrees via hyperplane arrangements. For independence models with more variables, we investigate the connection between the vanishing of factors of its principal A-determinant and its ML degree. Similarly, for models defined by the second hypersimplex, we determine its principal A-determinant and give computational evidence towards a conjectured lower bound of its ML degree.
Rational neural networks have recently gained attention for their ability to approximate complex nonlinearities and asymptotic behaviors more flexibly and efficiently than standard ReLU architectures. However, their optimization landscape remains comparatively less understood. In this paper, we investigate the training dynamics of shallow rational neural networks with fixed versus trainable rational activations. We characterize the stationary points of these networks and examine how poles, factorization symmetries, and higher-dimensional parameter spaces complicate gradient-based optimization. Notably, we show that spurious valleys, defined as connected components of sub-level sets that exclude a global minimum, which arise in architectures with fixed rational activations can be eliminated by allowing the rational coefficients to be updated during training. We demonstrate the theoretical findings with numerical experiments.
We can directly sample from the conditional distribution of any log-affine model (Electron. J. Stat. 11: 4452–4487, 2017). The algorithm is a Markov chain on a bounded integer lattice, and its transition probability is the ratio of the UMVUE (uniformly minimum variance unbiased estimator) of the expected counts to the total number of counts. The computation of the UMVUE accounts for most of the computational cost, which makes the implementation challenging. Here, we investigated an approximate algorithm that replaces the UMVUE with the MLE (maximum likelihood estimator). Although it is generally not exact, it is efficient and easy to implement; no prior study is required, such as about the connection matrices of the holonomic ideal in the original algorithm. The preprint is http://arxiv.org/abs/2502.00812.
There is a growing interest in the problem of 'causal abstraction', identifying a simplified model of an underlying complex system that nevertheless preserves useful causal relationships. We offer a new perspective on this problem, relating the space of causal abstractions to the partition refinement lattice. In addition to mathematical results (for example, characterizing when the abstraction lattice is distributive), we will discuss how this perspective leads to an algorithm to directly and efficiently learn causal abstractions from data. This is joint work with Pratik Misra.
Max-linear Bayesian networks are a class of Directed acyclic graphical (DAG) models which are of interest to statistics and data science due to their relevance to causality and probabilistic inference, particularly of extreme events. They differ from the more extensively studied Gaussian Bayesian Networks in that the structural equations governing the model are tropical polynomials in the random variables. This difference leads to several novel challenges in the task of causal discovery, i.e. the reconstruction of the true DAG underlying a given empirical distribution. More specifically, the combinatorial criteria for separation in the graph equating to conditional independence in the distribution are such that there is no longer a well-defined notion of Markov equivalence. In this poster, we explain how the PC algorithm for causal discovery in Gaussian Bayesian networks fails in the max-linear setting, and discuss how it may be modified so that the output is a well-defined subgraph of the true DAG which encodes its most significant causal relationships.
The signature of a parametrized path is a sequence of tensors whose entries are iterated integrals. This construction is central to the theory of rough paths in stochastic analysis. The set of all signatures, as the path varies, satisfies natural constraints that arise from a Lie group structure. Further constrains encode certain classes of paths, for example piecewise linear paths up to a certain number of segments or polynomial paths up to a certain degree. At the level of individual tensors, this gives rise to polynomial equations and hence to algebraic varieties. Likewise, certain statistical models for time dependent events, like Brownian motion with drift and covariance, can be encoded by expected signature varieties. The study of these varieties has emerged in recent years as a fruitful bridge between rough analysis and algebraic geometry. The poster aims to highlight how techniques inspired by algebraic statistics are used in this study, and how expected signatures in particular could be a new tool for algebraic statisticians.
We consider points sampled from a uniform distribution on a convex body in high dimensional real space with unknown location. In this case, the maximum likelihood estimator set is a convex body containing the true location parameter, and hence has a volume and diameter. We estimate these quantities, in terms of dimension and number of samples, by introducing upper and lower bounds. These bounds are different depending on the geometry of the convex body. We arrive at our results by employing algebraic, probabilistic and statistical techniques.
The formulation of covariance models for multivariate Gaussian random fields is of paramount importance, as they serve as the foundation for modeling and prediction. These models have been studied from a geometric perspective by Adler and Taylor (2009) and from an algebraic perspective by Améndola and Pham (2022). However, this multivariate models involve an increase in the number of parameters, resulting in higher computational costs in terms of time and memory usage for estimation and prediction processes. This poster addresses regularized approaches, such as LASSO, for the estimation of multivariate random fields. This strategy reduces computational burden while identifying significant attributes of the random field, such as determining when two variables are correlated, thereby reducing the problem’s dimensionality. We focus on estimating the multivariate Matérn model, applying a regularization term to the parameter that models the correlation between variables.
The expected signature of a family of paths need not be a signature of a path itself. Motivated by this, we consider the notion of a Lie group barycenter introduced by Buser and Karcher to propose a barycenter on signature tensors. We investigate affine algebraic varieties arising from barycenters of several families of samples in paths space, and use path learning techniques (Pfeffer, Seigal, Sturmfels) to recover the underlying path associated to the Lie group barycenter. This is joint work with Carlos Amendola.
We model causal relations among time-series data using a path-dependent stochastic differential equation (SDE). Any function of a path can be approximated by some linear functional of its signatures – i.e. some linear functional of the iterated integrals of the path. We leverage this fact to model the causal relations among the time-series data using an SDE with a finite number of parameters. We then propose a causal discovery method, where these parameters are estimated by solving a polynomial system of equations. We show that this parameter estimation method is consistent under certain conditions on the driving noise. This is joint work with Darrick Lee, Vincent Guan, and Elina Robeva.
Factor analysis is a statistical technique that explains correlations among observed random variables with the help of a smaller number of unobserved factors. In traditional full factor analysis, each observed variable is influenced by every factor. However, many applications exhibit interesting sparsity patterns, that is, each observed variable only depends on a subset of the factors. In this talk, we will discuss parameter identifiability of sparse factor analysis models. In particular, we present a sufficient condition for parameter identifiability that generalizes the well-known Anderson-Rubin condition and is tailored to the sparse setup. This is joint work with Mathias Drton, Miriam Kranzlmüller and Irem Portakal.
Implicit parameterization of unobserved confounding in structural equation models perturbs the off-diagonal idiosyncratic covariance matrix, thus avoids the explicit modeling of the confounders in data synthesis process. However, state-of-the-art methods introduces extra diagonally dominant constraints which restricts the coverage of the distributions space. To eliminate this limitation, we proposed an alternative data synthesis method via explicit modeling of the unobserved confounding, which also allows heterogeneous graph structures via generating a ground truth DAG hierarchically. This new protocol offers more refined control over the ground truth graph structure, which also enables hiding the confounders according to their topological orders. Empirically, we showed the unobserved confounding causal discovery algorithms like FCI, GIN, ParcelLinGAM performs differently under scenarios when unobserved confounders are children of observables, compared to when unobserved confounders are root variables. Via theoretical analysis, we showed the connection between the implicit and explicit synthetic observational distributions.
Fréchet means generalize the classical notion of a mean to arbitrary metric spaces, defined as any point that minimizes the sum of squared distances to the data points. In particular, Fréchet means often coincide with the maximum likelihood estimators of the mean of generalised Gaussian distributions, and are well understood in terms of the Alexandrov curvature of a space. However, when our metric is not strictly convex - such as the taxicab, supremum, or tropical metrics - Fréchet means are not so well behaved. In fact, they will not generally be unique. In this talk, we present the behaviour of Fréchet means on vector spaces with a norm given by some symmetric polytope; we identify the threshold sample size at which our Fréchet means become unique with positive probability, and we prove a central limit theorem for i.i.d. samples. Finally, we demonstrate the statistical applicability of polytope Fréchet means for hierarchical modelling.
A 2018 paper by Zhang, Naitzat, and Lim showed that neural networks under ReLU activation can be represented algebraically as tropical rational functions, demonstrating the potential for these architectures to be studied geometrically using the language of tropical geometry. Here, I will present an extension of this work to a particular class of networks known as transformers, which form the backbone of large language models such as ChatGPT. I will show that certain transformers can also be represented as tropical rational functions and use this fact to discuss the geometric nature of these structures. We'll see that the tropical function associated to a transformer admits a geometric representation comprised of zonotopes, and also that the tropical hypersurface of such a function contains the decision boundary of the associated transformer. I'll briefly discuss the combinatorics of this hypersurface and conclude by explaining how tropical representations of a transformer could provide a framework for constructing a rigorous theory of the behavior of these network models.
Rational neural networks are feedforward neural networks with a rational activation function. These networks find their application in approximating the solutions of PDEs, as they are able to learn the poles of meromorphic functions. In this talk we will consider the deep neural networks with learnable rational activation function. We will study the expressivity of such architectures (what functions we can learn) through the geometry of their neurovariety, i.e., the algebraic variety given as the Zariski closure of all possible architectures.