Session Chair: Arthur Tenenhaus, L2S CNRS
08:30–09:20
Nikos Sidiropoulos, University of Virginia
Abstract: Canonical correlation analysis (CCA) and its variants/generalizations are well-known and often used for data analysis. But are they really well understood? Tensor generalizations of CCA have been proposed, but without sufficient justification starting from first principles. In this talk, I will review the foundations of CCA and its connection to coupled factorization with partially common and "private" factors and shared subspace discovery, with emphasis on identifiability. I will then discuss my thoughts on what is the proper way to extend CCA from matrices to tensors. I will focus on a 3-slide talk that Richard Harshman gave at TRICAP 2006. I have been trying to understand these slides and get into Richard's train of thought. I think I have partially understood his proposal now, and I can explain it starting from basic principles.
Coffee Break 09:30–09:50
09:50–10:40
Katrijn Van Deun, Tilburg University
Abstract: Non-observable constructs such as personality, intelligence, and well-being are at the core of research on human behaviour and cognition. Latent variable methods (e.g., factor analysis, structural equation modelling) are therefore an indispensable tool for research in the social and behavioural sciences. These methods are known to work well when the number of parameters to estimate is relatively small compared to the sample size. However, modern research relies on large collections of data, including multidisciplinary approaches where several blocks of variables have been measured on the same persons. Currently available latent variable methods are restrictive in their use, often not allowing to analyze high-dimensional data and/or taking the multi-block structure into account. Here, we propose a regularized latent variable method that addresses these issues by relying on an approximate factor analysis approach and a strong computational framework inspired by alternating optimization and the alternating direction method of multipliers. We illustrate the (superior) performance of our method with simulated and real data, including psychological questionnaire and large genetic data obtained from patients with several psychiatric disorders.
10.50–11:40
Danny Dunlavy, Sandia National Laboratories
Abstract: We introduce the Generalized Multilinear Model (GMLM), a novel modeling framework that extends the Generalized Linear Model (GML) and Tensor-on-Tensor Regression (ToTR) for regression problems involving tensor, or multidimensional array, data. As with GLMs, GMLMs allow a linear model to relate expected response variables following arbitrary distributions to covariates via a general link function, providing flexibility in solving problems beyond typical identity link/Gaussian-response regression. As in ToTR, GMLMs allow for tensor covariates and responses, providing models that can leverage the multilinear structure inherent in many data that is often discarded when the data is vectorized and modeled entry-wise using scalar-response GLMs. Vectorizing the data often leads to an ill-posed problem unless provided a large sample size that increases with the product of the sizes of the covariate and response tensors. Instead, we impose low-rank tensor structure on the GMLM parameter tensor, thus requiring fewer samples and leading to a well-posed inference problem. In this talk, we will discuss the extensions of GLMs and ToTR that lead to GMLMs, introduce an algorithmic framework for solving the GMLM parameter inference problem when the low-rank structure imposed on the parameter tensor is the Canonical Polyadic (CP) model, and illustrate multiple uses of GMLMs on simulated and real-world application data.
This is joint work with Carlos Llosa, Jeremy Myers, and Rich Lehoucq.
Lunch 12:00–13:00
Session Chair: Katrijn Van Deun, Tilburg University
13:15–14:05
Roel van der Ploeg, University of Amsterdam
Abstract: The rapid growth of high-dimensional biological data has necessitated advanced data fusion techniques to integrate and interpret complex multi-omics and longitudinal datasets. Advanced Coupled Matrix and Tensor Factorization (ACMTF) has emerged as a powerful framework for uncovering global, local, and distinct sources of variation across datasets. However, standard ACMTF lacks the ability to model variation linked to a dependent variable, limiting its applicability to studies investigating biological phenotypes. To address this limitation, we introduce ACMTF-Regression (ACMTF-R), an extension of ACMTF that incorporates a regression term, allowing for the simultaneous decomposition of multi-way data while explicitly capturing variation associated with an outcome variable.
We present a detailed mathematical formulation of ACMTF-R, including its optimization framework and implementation. Through extensive simulations, we systematically evaluate its performance under varying conditions, examining its robustness to noise, the effect of sparsity constraints to induce the common, local and distinct structure of the model, and the impact of the tuning parameter ( ), which controls the balance between data exploration and outcome prediction. Our results demonstrate that ACMTF-R accurately recovers underlying variation structures and provides flexible model tuning, distinguishing it from existing approaches such as N-way Partial Least Squares (N-PLS) and traditional ACMTF.
To validate its applicability in a real-world setting, we apply ACMTF-R to a multi-omics dataset integrating human milk microbiome, human milk metabolome, and infant faecal microbiome data, investigating how maternal pre-pregnancy BMI and infant growth affect microbial and metabolic signatures. ACMTF-R successfully identifies novel relationships between maternal BMI and microbiome-metabolome interactions, underscoring its utility in multi-omics research. Our findings establish ACMTF-R as a versatile tool for multi-way data fusion, offering new insights into complex biological systems by integrating common, local, and distinct variation in the context of a dependent variable.
14:15–15:05
Fred White, University of Amsterdam
Abstract: The concept of common local and distinct (CLD) structure is increasingly recognised as pivotal in interpretable analysis of multi-omics data, where heterogeneous datasets harbour both shared and unshared variation. Traditional fusion methods such as JIVE, DISCO and PESCA though insightful lack the capability to incorporate a response variable. SCD-CoVR, sJIVE and OnPLS can incorporate a response but require multiple rounds of cross-validation or face combinatorial challenges when incorporating several blocks of data. To address this gap, we introduce PESCAR, an extension of PESCA designed to simultaneously uncover CLD structure in the context of an outcome, and in particular find features across the data blocks that (partially) share some pattern with the response. In our approach, we distinguish between ‘fuzzy’ components which allow some degree of overlap and ‘crispy’ components, where strict signal separation is expected, to explore how conventional orthogonality constraints, while mathematically convenient, may oversimplify complex biological processes.
PESCAR has 2 branches. One branch treats Y as an additional data block which increases the desired signal strength in the model, enabling the detection of smaller Y-related components in the data. The second branch incorporates a response-related penalty, thereby simultaneously enabling the prediction of Y as well as estimation of Y-related CLD patterns across multiple data blocks. We validate our approach through simulations comparing the recovery of scores and loadings against known simulated values across a gradient of fuzzy-crispy components. These simulations highlight the utility of PESCAR in detecting subtle Y-related signals across several components, which remain elusive with PESCA alone.
We further demonstrate the utility of PESCAR with an application to a multi-omics dataset which investigates the relationships between tomato plants and their root microbiome undergoing nitrogen deficiency. This yields biologically meaningful results - identifying sets of genes, metabolites and microbes that are known to be involved in abiotic stress mitigation. Overall, PESCAR represents a promising approach to supervised multi-block data integration in 2-way data and introduces a novel approach to feature selection in loading space.
Coffee Break 15:15–15:35
15:35–16:25
Arthur Tenenhaus, L2S CNRS
Abstract: In this talk, we present structural equation modeling with factors and composites within the framework of the basic design. We introduce two strategies for parameter estimation. The first is a non-iterative, SVD-based algorithm that yields consistent and asymptotically normal (CAN) estimators, providing a statistically and computationally sound framework. The second strategy is based on the (restricted) maximum likelihood approach (RML-SEM), which yields estimators that are both CAN and asymptotically efficient. SVD-SEM can serve as an initial solution for RML-SEM, facilitating the convergence of the algorithm to a relevant solution. To demonstrate the performance of these methods, we present a Monte Carlo simulation involving a nonrecursive model with both factors and composites.
16:35–17:25
Tom Wilderjans, Leiden University
Abstract: Subtyping patients based on differential brain functioning represents a promising approach in neuroscience for elucidating heterogeneity in brain diseases such as dementia and schizophrenia. In this talk, a framework is presented that integrates clustering techniques with advanced neuroimaging analyses to identify homogeneous groups of subjects characterized by distinct brain processes. Three key neurobiological processes underlying brain functioning aborbalre targeted: (1) functional connectivity (FC) patterns derived from single-session fMRI data using Independent Component Analysis (ICA), which identify correlated brain regions with synchronous activity; (2) longitudinal changes in FC patterns, assessed via Independent Vector Analysis (IVA) applied to repeated fMRI sessions, enabling the detection of temporally dependent alterations in network organization; and (3) relationships between fMRI-derived features (such as ALFF) and structural metrics (e.g., gray matter density and diffusion MRI measures) by multimodal integration of functional and structural data through Joint-ICA.
The proposed framework combines K-means clustering with ICA, IVA, and Joint-ICA to delineate subject clusters exhibiting similar brain functioning. This approach not only reveals inter-subject variability by clustering based on differences in brain functioning but also extracts the characteristic neurobiological processes defining each cluster. The effectiveness of these combined techniques is evaluated through comprehensive simulation studies and application to clinical data from patients with Alzheimer’s disease. The results demonstrate the utility of this framework in uncovering biologically meaningful subtypes, which may inform personalized diagnostic and therapeutic strategies. Additionally, software tools are discussed developed to facilitate the implementation of these analyses, enhancing reproducibility and broader adoption.
In summary, this work advances neuroimaging-based patient stratification by leveraging multimodal and longitudinal brain data, providing a robust approach to capture complex brain dynamics and heterogeneity in neurological disorders.
Dinner 18:30 - 20:30