Minimum Curvilinearity II (January 2013)

Minimum Curvilinear Embedding (MCE) as an novel form of nonlinear-kernel principal component analysis (PCA) and its application for topological prediction of protein-protein interactions

MCE was previously proposed as unsupervised machine learning for nonlinear dimensionality reduction (Cannistraci, et al., Bioinformatics 2010). The algorithm is parameter-free and time efficient, and was presented as a new form of nonlinear multidimensional scaling (MDS). It consists of a first step in which a nonlinear distance matrix (the minimum curvilinear distance matrix, MC-matrix) is calculated as pair-wise sample distances over the minimum spanning tree (MST) in the feature space. The MST is computed according to the Euclidean distance or the Correlation distance (or any other preferred distance) in the feature space. In case of network embedding, the MST is directly computed over the network. In the second step, the embedding transformation is performed by classical MDS of the MC-matrix.

Now, we propose an algorithmic innovation and interpret the MCE algorithm as a novel form of nonlinear-kernel principal component analysis (PCA).

In this new version, the MC-matrix is interpreted as a nonlinear and parameter-free kernel, and MC is a nonlinear and parameter-free measure, which produces a distance transformation stored in the MC-kernel (earlier referred to as MC-matrix). The first step of the new algorithm remains unaltered, while the embedding of the MC-kernel is executed by singular value decomposition (SVD) according to the following procedure: 1) centring of the MC-kernel 2) SVD decomposition of the centred MC-kernel followed by the embedding in an arbitrary dimension.

The new algorithm is still time efficient (and of course parameter-free), however it additionally offers a new attractive advantage, which consists in the fact that the centring of the MC-kernel can be also not omitted. In practice this generates a different dimension reduction device, that we named non-centred MCE (ncMCE).

We discuss the efficiency of ncMCE in comparison with MCE and several other techniques for nonlinear dimension reduction and topological prediction of protein-protein interactions, in the following study:

Cannistraci et al., 2013, Bioinformatics 29 (13), i199-i209: Minimum curvilinearity to enhance topological prediction of protein interactions by network embedding

Here below we provide:

- the MATLAB code for computing MCE (MDS based), MCE (SVD based), ncMCE (SVD based)

- the R code for computing MCE (SVD based) and ncMCE (SVD based)