A mathematical model is identifiable if its parameters can be recovered from data.
Here we investigate, for linear compartmental models, whether local generic identifiability is preserved when parts of the model -- specifically, inputs, outputs, leaks, and edges -- are moved, added, or deleted. Our results are as follows.First, for certain catenary, cycle, and mammillary models, moving or deleting the leak preserves identifiability. Next, for cycle models with up to one leak, moving inputs or outputs preserves identifiability. Thus, every cycle model with up to one leak (and at least one input and at least one output) is identifiable. Finally, for certain cycle models with no leaks, adding specific edges again preserves identifiability. Our proofs, which are algebraic and combinatorial in nature, rely on results on elementary symmetric polynomials and the theory of input-output equations for linear compartmental models.
Algebraic Statistics 2020
University of Hawai'i at Manoa
Honolulu, HI
Home | Program | Registration | Local Information
Online conference schedule
Full Program with Abstracts
Monday | Tuesday | Wednesday | Thursday | Friday
Monday, June 22
7:00 am HST/1:00 pm EST
Speaker: Guido Montúfar (UCLA)
Title: Minimum Optimal Transport Distance Estimation on a Variety
Abstract:
We study the problem of minimizing the Wasserstein distance between a probability distribution and an algebraic variety. We consider the setting of finite state spaces and describe the solution depending on the choice of the ground metric and the given distribution. The Wasserstein distance between the distribution and the variety is the minimum of a linear functional over a union of transportation polytopes. We obtain a description in terms of the solutions of a finite number of systems of polynomial equations. A detailed analysis is given for the two bit independence model.
This is based on joint work with Türkü Özlüm Çelik, Asgar Jamneshan, Bernd Sturmfels, Lorenzo Venturello
https://arxiv.org/abs/1909.11716
https://arxiv.org/abs/2003.06725
7:30 am HST/1:30 pm EST
Speaker: Isabelle Shankar (UC Berkeley)
Title: Symmetry Adapted Gram Spectrahedra
Abstract:
Sum of squares (SOS) relaxations are often used to certify nonnegativity of polynomials and are equivalent to solving a semidefinite program (SDP). The feasible region of the SDP for a given polynomial is the Gram Spectrahedron. For symmetric polynomials, there are reductions to the problem size that can be done using tools from representation theory. In recent work with Serkan Hosten, we investigate the geometric structure of the spectrahedra that arise in the study of symmetric SOS polynomials, the Symmetry Adapted PSD cone and the Symmetry Adapted Gram Spectrahedron.
8:00 am HST/2:00 pm EST
Speaker: Carlos Améndola (TU Munich)
Title: Conditional Independence in Max-linear Bayesian Networks
Abstract:
Motivated by extreme value theory, max-linear Bayesian networks have been recently introduced and studied as an alternative to linear structural equation models. However, for max-linear systems the classical independence results for Bayesian networks are far from exhausting valid conditional independence statements. We use tropical linear algebra to derive a compact representation of the conditional distribution given a partial observation, and exploit this to obtain a complete description of all conditional independence relations. We also introduce the notion of an impact graph which describes how extreme events spread deterministically through the network and we give a complete characterization of such impact graphs. Our analysis opens up several interesting questions concerning conditional independence and tropical geometry. This is joint work with Claudia Klüppelberg, Steffen Lauritzen and Ngoc Tran.
8:30 am HST/2:30 pm EST
Speaker: Colby Long (Wooster College)
Title: Hypothesis Testing with Rank Conditions in Phylogenetics
Abstract:
A phylogenetic model of sequence evolution for a set of n taxa is a collection of probability distributions on the 4^n possible site patterns that may be observed in their aligned DNA sequences. One can arrange the entries of these probability distributions into flattening matrices, where each flattening is constructed according to a possible split in the tree parameter of the model. For splits displayed by the tree, the corresponding flattening matrices are known to satisfy certain rank conditions. Methods such as ErikSVD and SVDQuartets take advantage of this observation by applying singular value decomposition to flattening matrices of empirical data. Each possible split is assigned an "SVD score'' based on how close the flattening is to the set of matrices of the predicted rank. When choosing among possible splits, the one with the lowest score is inferred to be the split in the evolutionary tree of the taxa. In this talk, we explore using the SVD score as a test statistic to determine if a split is present in a phylogenetic tree. To do so, we use several results to approximate the distribution of the SVD score and to give upper bounds on the p-value of the associated hypothesis tests. We also apply these hypothesis tests to phylogenetic data and discuss the implications of the results for interpreting SVD scores in other rank-based inference methods.
Tuesday, June 23
7:00 am HST/1:00 pm EST
Speaker: Liam Solus (KTH)
Title: Interventional Varieties
Abstract:
A driving question in algebraic statistics is the following: Given a statistical model what (if any) is its underlying algebraic structure? Furthermore, if this structure exists, can we demonstrate that the desirable properties of the statistical model are intrinsic to this algebraic structure? A classic example of this sort of phenomenon arises in the study of discrete DAG models, where one sees that the given model admits efficient implementation of exact inference algorithms precisely when the variety defining the model agrees with its (canonically) associated toric variety. More recently, DAG models have become the basis for modern approaches to causal inference that rely on a mixture of observational and interventional data. In this talk we will see how the global algebraic structure for these interventional DAG models are varieties living naturally in multiprojective space, and we will describe how the nice properties observed for classical DAG models generalize to this new context. This talk is based on ongoing work with Eliana Duarte.
7:30 am HST/1:30 pm EST
Speaker: Seth Gerberding (University of South Dakota)
Title: Identifiability of Linear Compartmental Models: The Effect of Moving Inputs, Outputs, and Leaks
Abstract:
8:00 am HST/2:00 pm EST
Speaker: Hector Baños (Georgia Tech)
Title: Identifiability of species networks using the log-det distance
Abstract:
It is known that hybridization plays an important role during the evolutionary process of some species. Therefore phylogenetic trees are sometimes insufficient to describe species-level relationships. We show that most topological features of a level-1 species network are identifiable under the network multi-species coalescent model (NMSC) using the log-det distance between aligned DNA sequences of concatenated genes. This identifiability result depends on understanding the relationships of expected site-pattern frequencies.
8:30 am HST/2:30 pm EST
Speaker: Jane Coons (North Carolina State)
Title: Quasi-Independence Models with Rational Maximum Likelihood Estimate
Abstract:
Let X and Y be random variables. Quasi-independence models are log-linear models that describe a situation in which some states of X and Y cannot occur together, but X and Y are otherwise independent. We characterize which quasi-independence models have rational maximum likelihood estimate based on combinatorial features of the bipartite graph associated to the model. In this case, we give an explicit formula for the maximum likelihood estimate.
Wednesday, June 24
7:00 am HST/1:00 pm EST
Speaker: Ruriko Yoshida (Naval Postgraduate School)
Title: Tropical Support Vector Machines
Abstract:
Most data in genome-wide phylogenetic analysis is essentially multidimensional, posing a major challenge to human comprehension and computational analysis. Also, we cannot directly apply statistical learning models in data science to a set of phylogenetic trees since tree space is not Euclidean. In fact, the space of phylogenetic trees is a tropical Grassmanian in the max-plus algebra. Therefore, to classify multi-locus data sets for phylogenetic analysis, we propose tropical Support Vector Machines (SVMs) over the space of phylogenetic trees. Like classical SVMs, a tropical SVM is a discriminative classifier defined by a tropical hyperplane which maximizes the minimum tropical distance from data points to itself to separate these data points into sectors. We show that, even though we have to solve exponentially many linear programming problems to find tropical SVMs, many instances are infeasible. Then we show the necessary and sufficient conditions for each instance to be feasible. In each case, there is an explicit formula for the optimal solution for the feasible linear programming problem. Based on our theorems, we develop novel methods to compute tropical SVMs, and computational experiments that show our methods work well.
7:30 am HST/1:30 pm EST
Speaker: Alexandros Grosdos (Osnabrück University)
Title: Exact Solutions in Log-Concave Maximum Likelihood Estimation
Abstract:
In nonparametric statistics we no longer assume that our data belong to a specific statistical model, but we impose restrictions on the shape of the distribution. Given a sample of points, the best log-concave distribution fitting the data has a probability density function whose logarithm is piecewise-linear. Numerical techniques used so far in statistics often do not allow us to answer mathematical questions, like determining the number of regions of linearity of such a function. Using this problem as our starting point, we equip ourselves with tools from algebra, combinatorics and numerical algebraic geometry on our quest to find and capture exact solutions.
This is joint work with Alex Heaton, Kaie Kubjas, Olga Kuznetsova, Georgy Scholten and Miruna-Stefana Sorea.
8:00 am HST/2:00 pm EST
Speaker: Pratik Misra (North Carolina State)
Title: Gaussian graphical models with toric vanishing ideals
Abstract:
Gaussian graphical models are semi-algebraic subsets of the cone of positive definite covariance matrices. They are widely used throughout natural sciences, computational biology and many other fields. Computing the vanishing ideal of the model gives us an implicit description of the model. In this presentation, we will talk about resolving two conjectures given by Sturmfels and Uhler. In particular, we characterize those graphs for which the vanishing ideal of the Gaussian graphical model is generated in degree 1 and 2. These turn out to be the Gaussian graphical models whose ideals are toric ideals, and the resulting graphs are the 1-clique sums of complete graphs.
8:30 am HST/2:30 pm EST
Speaker: Daniel Bernstein (MIT)
Title: Rigidity of symmetry-forced frameworks
Abstract:
The fundamental problem in rigidity theory is to determine whether a given immersion of a graph into R^d can be continuously deformed, treating the edges as rigid struts that can move freely about their incident vertices. For a fixed graph G, either all generic immersions of G into R^d are rigid, in which case we say that G is generically rigid in R^d, or all generic immerisions of G into R^d are not rigid, in which case we say that G is generically flexible in R^d. Perhaps the most well-known result in this area is Laman's theorem, which is a combinatorial characterization of the graphs that are generically minimally rigid in R^2. In applications to crystallography, the relevant frameworks have symmetry that continuous deformations must preserve. The main result of this talk is a Laman-like theorem for symmetric frameworks with certain kinds of symmetry.
Thursday, June 25
7:00 am HST/1:00 pm EST
Speaker: Marta Casanellas (Universitat Politècnica de Catalunya)
Title: The embedding problem for Markov matrices
Abstract:
Characterizing whether a Markov process of discrete random variables has an homogeneous continuous-time realization is a hard problem. This is known as the embedding problem and reduces to deciding whether a given Markov matrix can be written as the exponential of some rate matrix. This is an old question which has been only completely solved for matrices of size 2x2 or 3x3. We have recently addressed this problem and have established a criterion for deciding whether a generic Markov matrix of any size is embeddable. For the case of matrices of size 4x4 (which is of interest for nucleotide substitution models) we solve the embedding problem for any Markov matrix. The solution in this case is more concise as the embeddability is given in terms of a single condition. In this talk we shall explain this joint work with Jesús Fernández-Sánchez and Jordi Roca-Lacostena.
7:30 am HST/1:30 pm EST
Speaker: Naoki Hayashi (Tokyo Institute of Tech / NTT DATA Mathematical Systems Inc.)
Title: Bayesian Generalization Error and Real Log Canonical Threshold in Non-negative Matrix Factorization and Latent Dirichlet Allocation
Abstract:
Singular learning theory is statistics theory for singular models, which can be interpreted as an intersection between algebraic statistics and machine learning theory. Typical hierarchical models and latent variable models, such as matrix factorization and topic model, are statistically singular since a map from a parameter to a probability density function is not one-to-one. Clarifying generalization behaviors in singular models is an important problem to estimate sufficient sample sizes and to tune hyperparameters; however, conventional statistics theory cannot be applied because their likelihood function cannot be approximated by any Gaussian function. Singular learning theory provides a general theory for this problem; birational invariants of an analytic set defined by zero points of a Kullback-Leibler (KL) divergence determines the generalization error.
One of such invariants is a real log canonical threshold (RLCT). An RLCT is a negative maximum pole of a zeta function defined by an integral of a KL divergence from the true distribution to a statistical model. Determining an RLCT of a given concrete model is performed by resolution of singularities. In fact, algebraic statistician and machine learning researchers have been derived the exact values or an upper bounds of the RLCTs of several singular models (also known as learning machines), which is useful in statistical model section such as Drton's sBIC.
In this virtual talk, we will introduce recent studies for determining RLCTs of singular models whose parameter regions are restricted. We have derived an upper bound of the RLCT in non-negative matrix factorization (NMF) and latent Dirichlet allocation (LDA). Zeta functions of NMF and LDA have similar forms as reduced rank regression (RRR); however, their maximum poles are different from that of RRR since NMF has a non-negatively restricted parameter region and LDA also has a simplex restricted one.
8:00 am HST/2:00 pm EST
Speaker: Frank Röttger (MPI MiS Leipzig)
Title: A central limit theorem for the two-sided descent statistic on Coxeter groups
Abstract: Coxeter statistics are maps which assign natural numbers to each element of a Coxeter group. When we pick an element at random, for example according to the uniform distribution, a Coxeter statistic generates a random variable.
A sequence of uniformly chosen elements from Coxeter groups of growing rank therefore gives rise to a sequence of random variables. We study the statistic which assigns to an element of a nite Coxeter group the number of descents plus the number of descents of its inverse. Our main result shows that this statistic is asymptotically Gaussian when the rank of the non-dihedral irreducible components tends to innity. This is achieved by a combination of already known results for the irreducible types (e.g. the symmetric and hyperoctahedral groups) and an application of Lindeberg's theorem. Our work answers a question of Kahle--Stump and builds upon work of Chatterjee--Diaconis, Özdemir and Röttger. This is joint work with Benjamin Brück.
8:30 am HST/2:30 pm EST
Speaker: Muhammad Ardiyansyah (Aalto University)
Title: Model Embeddability for Symmetric Group-Based Models
Abstract:
We study model embeddability, which is a variation of the famous embedding problem in probability theory, when apart from the requirement that the Markov matrix is the matrix exponential of a rate matrix, we additionally ask that the rate matrix follows the model structure. We provide a characterisation of model embeddable Markov matrices corresponding to symmetric group-based phylogenetic models. In particular we provide necessary and sufficient conditions in terms of the eigenvalues of symmetric group-based matrices. To showcase our main result on model embeddability, we provide an application to hachimoji models, which are eight-state models for synthetic DNA. Moreover, our main result on model embeddability, enables us to compute the volume of the set of model embeddable Markov matrices relative to the volume of other relevant sets of Markov matrices within the model.
Friday, June 26
7:00 am HST/1:00 pm EST
Speaker: Donald Richards (Penn State)
Title: Sums and Powers of Totally Positive Functions
Abstract:
This talk is based on joint work with Satoshi Kuriki (ISM, Tokyo) and Ishan Muzumdar (Penn State University).
The total positivity properties of products of totally positive (TP) functions are well-known to follow from the Binet-Cauchy formula. By contrast, the total positivity properties of sums or arbitrary powers of TP functions appears to be more difficult, and results obtained hitherto apply only to functions that are TP of order 2. In this talk, we present results for functions that are TP of order 3 and higher. Even in the case of order 3, the results obtained require intricate algebraic calculations that indicate the difficulties inherent in the general case.
7:30 am HST/1:30 pm EST
Speaker: Toru Imai (Kyoto University)
Title: Estimating real log canonical thresholds
Abstract:
Evaluation of the marginal likelihood of singular models is a challenging task. The singular Bayesian information criterion (sBIC; Drton and Plummer (2017)) gives the state-of-the-art approximation to the log marginal likelihood, which can be applied to both regular and singular models. However, sBIC requires the theoretical values of the real log canonical thresholds, but only few real log canonical thresholds are known. In this talk, we propose a new estimator of the real log canonical thresholds using MCMC simulations, which leads to make sBIC widely applicable.
8:00 am HST/2:00 pm EST
Speaker: Ben Hollering (North Carolina State)
Title: Identifiability in Phylogenetics using Algebraic Matroids
Abstract:
Identifiability is a crucial property for a statistical model since it implies that distributions in the model uniquely determine the parameters that produce them. In phylogenetics, the identifiability of the tree parameter is of particular interest since it means that phylogenetic models can be used to infer evolutionary histories from data. Typical strategies for proving identifiability results require Gröbner basis computations which become untenable as the size of the model grows. In this talk I'll give some background on phylogenetic algebraic geometry and then discuss a new computational strategy for proving the identifiability of discrete parameters in algebraic statistical models that uses algebraic matroids naturally associated to the models. This algorithm allows us to avoid computing Gröbner bases and is also parallelizable.
8:30 am HST/2:30 pm EST
Speaker: Elina Robeva (University of British Columbia)
Title: Superresolution Imaging and Total Positivity
Abstract:
One of the most practically important yet theoretically challenging problems in signal processing is that of superresolution imaging. Every imaging device (including our eyes) blurs the point sources of light it images. In mathematical terms this can be expressed as convolving the original true image with a point spread function (e.g. a Gaussian bump). In addition, in many applications, such as astronomy and microscopy, the image is of low resolution. We study the inverse problem of recovering the point source locations from a low-resolution blurred sparse image. We show that a certain optimization program can achieve this. In the noiseless setting, we show this can be done even if the point sources are arbitrarily close. Our proof technique relies on a total positivity condition being satisfied by the point spread function.