Overview

1-Canonical Correlation Analysis (CCA)

The basic CCA problem has been studied for over a century since the first paper by Hotteling in 1936. CCA attempts to describe the linear relation between two sets of data by defining a new orthogonal coordinate system for each of the two sets so that the new pair of coordinate systems are optimal in maximizing the correlations. The new systems of coordinates are linear systems of the original ones. Thus, the aim of CCA is to map the inputs data into a low dimensional feature space in order to extract common features from a pair of multivariate data. Then, CCA investigate the linear relationship between two sets of variables in order to elucidate complex dependency structures and to identify modules of interactingv ariables in multivariate data.

2. Robust Canonical Correlation Analysis (RCCA) [Outliers]

3-Sparse Canonical Correlation Analysis (SCCA) [High dimension]

For modern high-dimensional data, the regular CCA algorithm has been modified by introducing sparsity. In the high dimension case where the number of variables exceeds the sample size or when the variables are highly correlated, regular CCA is no longer appropriate. In 2011, Hardoon and Shawe-Taylor propsed a sparse CCA which minimises the number of features used in both the primal and dual projections while maximising the correlation between the two views. In 2015, Wilms et al. designed a sparse CCA in which sparse estimation produces a linear combinations of only a subset of variables from each data set, thereby increasing the interpretabilityof the canonical variates. In order to facilitate large-scale computation, Cruz-Cano and Lee designed in 2014 a fast regularized CCA whilst Ma et al. proposed a scalable CCA in 2015.

4-Probabilistic Canonical Correlation Analysis (PCCA)

CCA is a fundamental and highly versatile statistical approach. Despite its pervasiveness, its interpretation has been somewhat elusive. Only relatively recently this has changed by viewing CCA as probablistic model. In 2005, Bach and Jordan were the first to propose a latent variable model for CCA. Probabilistic models not only allow to derive the regular CCA algorithm from a density estimation perspective but also provide an avenue for Bayesian variants of CCA (Wang, 2007) (Klami and Kasli, 2007) extended probabilistic CCA by introducing additional latent variables to model low-rank approximations of non-shared covariances. However, the separation of shared and non-shared variability in these models requires prior information and is typically not identifiable from observed data alone. In 2018, Jendoubi and Strimmer revisited PCCA from the perspective of whitening of random variables by designing a flexible probabilistic model for CCA linking together multivariate regression, latent variable models, and high-dimensional estimation.

5-Kernel Canonical Correlation Analysis (KCCA)

In complex situations, CCA does not extract useful features because of its linearity. In 2001, Akaho investigate the effectiveness of applying kernel method to CCA. In literature, a few number of kernels are used in CCA to capture nonlinear relationship in data space, which is linear in some higher dimensional feature space. But, not much work has been done to investigate their relative performances through simulation and also from the view point of sensitivity. In 2008, Alam et al. compared performances of kernel canonical correlation coefficients (Gaussian, Laplacian and Polynomial) with that of classical and robust CCA coefficient measures using simulation and influence function. Alam et al. observed that the class of kernel estimators perform better than the class of classical and robust CCA in general. The kernel estimator with Laplacian function shown the best performance for large sample size.

6-Generalized Canonical Correlation Analysis (GCCA)

Page updated

Google Sites

Report abuse