Role of PCA in current data science

Introduction

Laymen explanation

PCA reduce the size of the feature vector and eliminate the redundant features. This helps to improve robustness of the model. If you like to know its role in current data science, then this document helps.

Technical explanation

Principal Component Analysis (PCA) is a statistical procedure that uses an orthogonal transformation which converts a set of correlated variables to a set of uncorrelated variables. PCA is a most widely used tool in exploratory data analysis and in machine learning for predictive models.

The main idea of principal component analysis (PCA) is to reduce the dimensionality of a data set consisting of many variables correlated with each other, either heavily or lightly, while retaining the variation present in the dataset, up to the maximum extent.

This is done by transforming the variables to a new set of variables, which are known as the principal components (or simply, the PCs) and are orthogonal, ordered such that the retention of variation present in the original variables decreases as we move down in the order. So, in this way, the 1st principal component retains maximum variation that was present in the original components. The principal components are the eigenvectors of a covariance matrix, and hence they are orthogonal.

Illustration with 2-D

Before PCA

After PCA

Note the line which represents the 2-D points in 1-D. This line is derived using eigenvector mathematical concept. This eigenvector is used as principal component.

PCA Impact on variance

PCA replaces original variables with new variables, called principal components, which are orthogonal (i.e. they have zero covariations) and have variances (called eigenvalues) in decreasing order.

As an example, Below is the covariance matrix of some 3 variables. Their variances are on the diagonal, and the sum of the 3 values (3.448) is the overall variability.

1.343730519 -.160152268 .186470243 -.160152268 .619205620 -.126684273 .186470243 -.126684273 1.485549631

Now, after PCA, the covariance matrix between the principal components extracted from the above data is below. Note that here co-variance became zero.

1.651354285 .000000000 .000000000 .000000000 1.220288343 .000000000 .000000000 .000000000 .576843142

Note that the diagonal sum is still 3.448, which says that all 3 components account for all the multivariate variability. The 1st principal component accounts for or "explains" 1.651/3.448 = 47.9% of the overall variability; the 2nd one explains 1.220/3.448 = 35.4% of it; the 3rd one explains .577/3.448 = 16.7% of it.

Role of eigenvectors and eigenvalues

The eigenvectors and eigenvalues of a covariance (or correlation) matrix represent the “core” of a PCA. Here, The eigenvectors (principal components) determine the directions of the new feature space, and the eigenvalues determine their magnitude. In PCA matrix, they are ordered in decreasing manner

What if eigenvectors is not a real number, instead it is complex number?

Eigenvalues for Co-variance matrix is always real number and so, this case will not arise. Note that co-variance matrix is positive semi definite matrix (non-negative definite matrix) and symmetric and so, eigenvalues are always real and strictly non-negative number.

Use in machine learning

The ability to generalize correctly becomes exponentially harder as the dimensionality of the training dataset grows. PCA helps in this by reducing dimension.
PCA can also be used as a filtering approach for noisy data. The idea is this: any components with variance much larger than the effect of the noise should be relatively unaffected by the noise. So if you reconstruct the data using just the largest subset of principal components, you should be preferentially keeping the signal and throwing out the noise.

Refer colab example here (sourced from here)

Impact of PCA in dataset

PCA impacts the statistical property namely variance of dataset. Note that PCA transforms the original variables to a new set of variables. During this, dataset variance changes.

When reducing the dimensions of data, it’s important not to lose more information than is necessary. The variation in a data set can be seen as representing the information that we would like to keep. Principal Component Analysis (PCA) is a well-established mathematical technique for reducing the dimensionality of data, while keeping as much variation as possible. Selection of new dimension hyperparameter is important. Higher the number, better the quality.

Reasoning for excluding low valued components

Low valued components have low variances(Note that their eigenvalues will be low). Note that these values can be approximated with mean value safely (since scatter is low) and so, loss of information will be not significant. Hence losing low valued components are fine. This is the reason behind power of this algorithm (dimensionality reduction along with good performance)

Relevance with neural networks

We use PCA to reduce the dimensions of our dataset so that when you apply the resulting dataset on a machine learning algorithm the computational time decreases while training the algorithm.

Incremental PCA

One problem with the preceding implementation of PCA is that it requires the whole training set to fit in memory in order for the SVD algorithm to run.

Incremental PCA (IPCA) algorithms can be used to split the training set into mini-batches and feed an IPCA algorithm one mini-batch at a time.

This is useful for large training sets, and also to apply PCA online (i.e., on the fly, as new instances arrive).

Selecting hyperparameters

Although the algorithm needs number of features, however actual need is different. In this case, it is better to decide on the % of variance which needs to be preserved.

Point to remember

PCA is not a linear regression and so ensure dataset linearity using data visualisation approaches before using this
Zero mean normalisation of features is important before applying PCA.

Reference

https://youtu.be/JlmJ5PEmIOo

https://en.wikipedia.org/wiki/Principal_component_analysis

https://www.dezyre.com/data-science-in-python-tutorial/principal-component-analysis-tutorial

https://www.geeksforgeeks.org/ml-principal-component-analysispca/

https://www.researchgate.net/post/How-can-PCA-reduce-the-size-of-the-feature-vector-and-eliminate-the-redundant-features

https://stats.stackexchange.com/questions/22569/pca-and-proportion-of-variance-explained

https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues/140579#140579?newreg=61a6aab73cbe4720b00fbf52bd8b9afd

https://www.youtube.com/watch?v=6Pv2txQVhxA

https://www.qlucore.com/news/the-benefits-of-principal-component-analysis-pca

https://medium.com/analytics-vidhya/merging-principal-component-analysis-pca-with-artificial-neural-networks-1ea6dad2c095

https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.09-Principal-Component-Analysis.ipynb#scrollTo=Ao1YFn0OcBD8

https://stats.stackexchange.com/questions/247260/principal-component-analysis-eliminate-noise-in-the-data

https://medium.com/@dareyadewumi650/understanding-the-role-of-eigenvectors-and-eigenvalues-in-pca-dimensionality-reduction-10186dad0c5c

https://sebastianraschka.com/Articles/2015_pca_in_3_steps.html

https://youtu.be/YSqFB7Srx-4

https://youtu.be/YSqFB7Srx-4?t=1082

https://math.stackexchange.com/questions/2026480/covariance-matrix-with-complex-eigenvalues

https://youtu.be/YSqFB7Srx-4?t=2205

https://coursera.org/share/45a54686085922d1458cdc86d8a086eb

https://stackoverflow.com/questions/24729447/what-does-it-mean-to-have-zero-mean-in-the-data

https://www.amazon.in/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1491962291