Geometry of Nonnegative Rank

Geometry of nonnegative rank is funded from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 748354.

Objectives

This project is centered on nonnegative rank - a notion that modifies the definition of matrix rank. Our goal is to do basic research aimed at better understanding the geometry of matrices and tensors of nonnegative rank at most r. Nonnegative rank appears in various applications such as statistics, machine learning, audio processing, image compression and document analysis. Understanding the geometry of nonnegative rank is fundamental in its own right and important for the theory and algorithms behind the applications. The overall objectives of this project are investigation of zero patterns in nonnegative matrix factorizations, understanding the boundaries and semialgebraic descriptions of the set of matrices and tensors of nonnegative rank r, and implications of this theory to applications in statistics and other fields. Computing and understanding nonnegative rank is interdisciplinary. Completing this project requires techniques from polyhedral geometry, real algebraic geometry, optimization, statistics, symbolic algebra, and numerics.

Results

  • In collaboration with Robert Krone, we used ideas from rigidity theory to study uniqueness of nonnegative matrix factorizations in the case when nonnegative rank of a matrix is equal to its rank. Nonnegative matrix factorizations are often encountered in data mining applications where they are used to explain datasets by a small number of parts. For many of these applications it is desirable that there exists a unique nonnegative matrix factorization up to trivial modifications given by scalings and permutations. This means that model parameters are uniquely identifiable from the data. We characterize infinitesimally rigid nonnegative factorizations, prove that a nonnegative factorization is infinitesimally rigid if and only if it is locally rigid and a certain matrix achieves its maximal possible Kruskal rank, and show that locally rigid nonnegative factorizations can be extended to globally rigid nonnegative factorizations. We also explore connections between rigidity of nonnegative factorizations and boundaries of the set of matrices of fixed nonnegative rank. Finally we extend these results from nonnegative factorizations to completely positive factorizations. These results appear in a preprint on arXiv and are submitted for publication. This article is openly available at https://arxiv.org/abs/1902.02868.
  • One direction in the study of nonnegative rank is motivated by linear relaxations for hard combinatorial optimization problems. Nonnegative rank of the slack matrix of the feasible region of the relaxation measures the complexity of the linear relaxation and hence provides lower bounds for the complexity of the original optimization problem. In joint article with Per Austrin and Petteri Kaski, we study another restricted model of computation which is given by tensor networks for evaluating multilinear maps. We show that this model of computation captures the best algorithms for several problems, such as matrix multiplication, discrete Fourier transform, (3t)-clique counting and computing the permanent of a matrix. For counting homomorphisms of a general pattern graph P into a host graph on n vertices we obtain an upper bound that essentially matches the bound for counting cliques and yields small improvements over previous algorithms for many choices of P. This article is openly available at https://arxiv.org/abs/1712.09630 and is published Proceedings of 10th Innovations in Theoretical Computer Science Conference.
  • The spatial organization of the DNA in the cell nucleus plays an important role for gene regulation, DNA replication, and genomic integrity. Through the development of chromosome capture experiments (such as 3C, 4C, Hi-C) it is now possible to obtain the contact frequencies of the DNA at the whole-genome level. In the joint paper with Anastasiya Belyaeva, Lawrence J. Sun and Caroline Uhler, we study the problem of reconstructing the 3D organization of the genome from such whole-genome contact frequencies. A standard approach is to transform the contact frequencies into noisy distance measurements and then apply semidefinite programming (SDP) formulations to obtain the 3D configurations. However, neglected in such reconstructions is the fact that most eukaryotes including humans are diploid and therefore contain two copies of each genomic locus. We prove that the 3D organization of the DNA is not identifiable from pairwise distance measurements derived from Hi-C for diploid organisms. In fact, there are infinitely many solutions even in the noise-free setting. We then discuss various additional biologically relevant constraints (including distances between neighboring genomic loci and to the nucleus center or higher-order interactions) and prove identifiability under these conditions. Finally, we provide SDP formulations for computing the 3D embedding of the DNA with these additional constraints and show that we can recover the true 3D embedding with high accuracy also from both noiseless and noisy measurements. These formulations minimize the trace of a Gram matrix as an approximation of rank minimization and finally use eigendecomposition to get a rank three approximation of the Gram matrix. These results are in the last stages of preparation and will be posted in the near future.
  • In joint article with Elizabeth Allman, Hector Banos Cervantes, Robin Evans, Serkan Hosten, Daniel Lemke, John Rhodes and Piotr Zwiernik, we study binary tensors of nonnegative rank at most two and three. We give the boundary stratification of binary tensors of nonnegative rank at most two. For small tensors, we show that this stratification can be used for exact computation of maximum likelihood estimates (MLEs). This method guarantees finding the MLE whereas the EM algorithm does not. We show how the EM fixed point ideal provides an alternative method for obtaining the boundary decomposition and for computing MLEs. For 2x2x2 tensors we explicitly show that there is a correspondence between the boundary poset and the minimal primes of the EM fixed point ideal. Based on this correspondence for nonnegative rank at most two, we conjecture that the minimal primes of the EM fixed point ideal for 2x2x2 tensors of nonnegative rank at most three are in correspondence with the boundary components and singular locus of this set. This conjecture is proven by a simultaneous paper by Seigal and Montufar, who characterize the boundary poset of 2x2x2 tensors of nonnegative rank at most three. This article is openly available at https://arxiv.org/abs/1710.01696.
  • In joint article with Carlos Amendola and Dimitra Kosta, we study the maximum likelihood estimation problem for several classes of toric Fano models. We start by exploring the maximum likelihood degree for all 2-dimensional Gorenstein toric Fano varieties. We show that the ML degree is equal to the degree of the surface in every case except for the quintic del Pezzo surface with two singular points of type A1 and provide explicit expressions that allow to compute the maximum likelihood estimate in closed form whenever the ML degree is less than 5. We then explore the reasons for the ML degree drop using A-discriminants and intersection theory. Finally, we show that toric Fano varieties associated to 3-valent phylogenetic trees have ML degree one and provide a formula for the maximum likelihood estimate. We prove it as a corollary to a more general result about the multiplicativity of ML degrees of codimension zero toric fiber products, and it also follows from a connection to a recent result about staged trees. This article is openly available at https://arxiv.org/abs/1905.07396.