My PhD aims to investigate the interest of flag manifolds in statistical learning.
A flag is a sequence of nested subspaces of increasing dimension. The sequence of dimensions is called the signature of the flag.
Here is illustrated a flag of signature (1, 2, 3).
Similarly, a flag can be seen as a sequence of mutually-orthogonal subspaces. The sequence of incremental dimensions is the type.
Here is illustrated the same flag, of type (1, 1, 1). The eigenspaces of a symmetric matrix form a flag, whose type is the sequence of eigenvalue multiplicities.
The set of all flags of a given type is a manifold (smooth, compact, connected, Riemannian).
Flag manifolds generalize Grassmannian and Stiefel manifolds, which are largely used in applied research.
The goal of my PhD is to motivate the interest of flag manifolds in statistical learning.
Up until now, I have been investigating my thesis with different contributions:
Parsimonious covariance models parameterized with flag manifolds. See the curse of isotropy, eigengap sparsity and the mixtures of principal subspace analyzers.
Nested subspace learning via optimization on flag manifolds. See the flag trick.
Geodesic distance and interpolation on flag manifolds. See GSI'23 paper.
Gaussian mixture models (GMMs) are ubiquitous in statistical learning, particularly for unsupervised problems. While full GMMs suffer from the overparameterization of their covariance matrices in high-dimensional spaces, spherical GMMs (with isotropic covariance matrices) certainly lack flexibility to fit certain anisotropic distributions. Connecting these two extremes, we introduce a new family of parsimonious GMMs with piecewise-constant covariance eigenvalue profiles. These extend several low-rank models like the celebrated mixtures of probabilistic principal component analyzers (MPPCA), by enabling any possible sequence of eigenvalue multiplicities. If the latter are prespecified, then we can naturally derive an expectation-maximization (EM) algorithm to learn the mixture parameters. Otherwise, to address the notoriously-challenging issue of jointly learning the mixture parameters and hyperparameters, we propose a componentwise penalized EM algorithm, whose monotonicity is proven. We show the superior likelihood-parsimony tradeoffs achieved by our models on a variety of unsupervised experiments: density fitting, clustering and single-image denoising. Link for the paper (in revision for a statistics journal).
Covariance estimation is a central problem in statistics. An important issue is that there are rarely enough samples n to accurately estimate the p(p+1)/2 coefficients in dimension p. Parsimonious covariance models are therefore preferred, but the discrete nature of model selection makes inference computationally challenging. In this paper, we propose a relaxation of covariance parsimony termed "eigengap sparsity" and motivated by the good accuracy-parsimony tradeoff of eigenvalue-equalization in covariance matrices. This new penalty can be included in a penalized-likelihood framework that we propose to solve with a projected gradient descent on a monotone cone. The algorithm turns out to resemble an isotonic regression of mutually-attracted sample eigenvalues, drawing an interesting link between covariance parsimony and shrinkage. Link for the paper (accepted at GSI'25 with oral presentation).
Many machine learning methods look for low-dimensional representations of the data. The underlying subspace can be estimated by first choosing a dimension q and then optimizing a certain objective function over the space of q-dimensional subspaces (the Grassmannian). Trying different q yields in general non-nested subspaces, which raises an important issue of consistency between the data representations. In this paper, we propose a simple trick to enforce nestedness in subspace learning methods. It consists in lifting Grassmannian optimization problems to flag manifolds (the space of nested subspaces of increasing dimension) via nested projectors. Link for the paper (in revision for a machine learning journal).
Principal component analysis is a ubiquitous method in data analysis, widely used by applied scientists for interpretability purposes. In this work, we raise an important issue about the interpretation of principal components with close eigenvalues. They may indeed suffer from an important rotational variability, which is a pitfall for interpretation. Through the lens of a probabilistic covariance model parameterized with flags, we show that relatively-close eigenvalues should be block-averaged and we propose in this context to transition from ill-defined principal components to more-interpretable principal subspaces. Link for the paper (in revision for a statistics journal).
This work proposes an algorithm for approximating the Riemannian logarithm and geodesic distance on flag manifolds. They are indeed central tools in geometric statistics but their simple explicit expression on Grassmannians does not seem to generalize to flag manifolds. We rethink their computation as an alignment problem on equivalence classes of orthogonal matrices and find a closed-form solution to a relaxed version. This approximation brings drastic improvements over a previously proposed algorithm. Link for the paper (accepted at GSI'23 with oral presentation). See more in the dedicated blog post.