PCA reduce the size of the feature vector and eliminate the redundant features. This helps to improve robustness of the model. If you like to know its role in current data science, then this document helps.
Principal Component Analysis (PCA) is a statistical procedure that uses an orthogonal transformation which converts a set of correlated variables to a set of uncorrelated variables. PCA is a most widely used tool in exploratory data analysis and in machine learning for predictive models.
The main idea of principal component analysis (PCA) is to reduce the dimensionality of a data set consisting of many variables correlated with each other, either heavily or lightly, while retaining the variation present in the dataset, up to the maximum extent.
This is done by transforming the variables to a new set of variables, which are known as the principal components (or simply, the PCs) and are orthogonal, ordered such that the retention of variation present in the original variables decreases as we move down in the order. So, in this way, the 1st principal component retains maximum variation that was present in the original components. The principal components are the eigenvectors of a covariance matrix, and hence they are orthogonal.
Before PCA
Note the line which represents the 2-D points in 1-D. This line is derived using eigenvector mathematical concept. This eigenvector is used as principal component.
PCA replaces original variables with new variables, called principal components, which are orthogonal (i.e. they have zero covariations) and have variances (called eigenvalues) in decreasing order.
As an example, Below is the covariance matrix of some 3 variables. Their variances are on the diagonal, and the sum of the 3 values (3.448) is the overall variability.
1.343730519 -.160152268 .186470243 -.160152268 .619205620 -.126684273 .186470243 -.126684273 1.485549631
Now, after PCA, the covariance matrix between the principal components extracted from the above data is below. Note that here co-variance became zero.
1.651354285 .000000000 .000000000 .000000000 1.220288343 .000000000 .000000000 .000000000 .576843142
Note that the diagonal sum is still 3.448, which says that all 3 components account for all the multivariate variability. The 1st principal component accounts for or "explains" 1.651/3.448 = 47.9% of the overall variability; the 2nd one explains 1.220/3.448 = 35.4% of it; the 3rd one explains .577/3.448 = 16.7% of it.
The eigenvectors and eigenvalues of a covariance (or correlation) matrix represent the “core” of a PCA. Here, The eigenvectors (principal components) determine the directions of the new feature space, and the eigenvalues determine their magnitude. In PCA matrix, they are ordered in decreasing manner
What if eigenvectors is not a real number, instead it is complex number?
Eigenvalues for Co-variance matrix is always real number and so, this case will not arise. Note that co-variance matrix is positive semi definite matrix (non-negative definite matrix) and symmetric and so, eigenvalues are always real and strictly non-negative number.
The ability to generalize correctly becomes exponentially harder as the dimensionality of the training dataset grows. PCA helps in this by reducing dimension.
PCA can also be used as a filtering approach for noisy data. The idea is this: any components with variance much larger than the effect of the noise should be relatively unaffected by the noise. So if you reconstruct the data using just the largest subset of principal components, you should be preferentially keeping the signal and throwing out the noise.
Refer colab example here (sourced from here)
PCA impacts the statistical property namely variance of dataset. Note that PCA transforms the original variables to a new set of variables. During this, dataset variance changes.
When reducing the dimensions of data, it’s important not to lose more information than is necessary. The variation in a data set can be seen as representing the information that we would like to keep. Principal Component Analysis (PCA) is a well-established mathematical technique for reducing the dimensionality of data, while keeping as much variation as possible. Selection of new dimension hyperparameter is important. Higher the number, better the quality.
Low valued components have low variances(Note that their eigenvalues will be low). Note that these values can be approximated with mean value safely (since scatter is low) and so, loss of information will be not significant. Hence losing low valued components are fine. This is the reason behind power of this algorithm (dimensionality reduction along with good performance)
We use PCA to reduce the dimensions of our dataset so that when you apply the resulting dataset on a machine learning algorithm the computational time decreases while training the algorithm.
One problem with the preceding implementation of PCA is that it requires the whole training set to fit in memory in order for the SVD algorithm to run.
Incremental PCA (IPCA) algorithms can be used to split the training set into mini-batches and feed an IPCA algorithm one mini-batch at a time.
This is useful for large training sets, and also to apply PCA online (i.e., on the fly, as new instances arrive).
Although the algorithm needs number of features, however actual need is different. In this case, it is better to decide on the % of variance which needs to be preserved.
PCA is not a linear regression and so ensure dataset linearity using data visualisation approaches before using this
Zero mean normalisation of features is important before applying PCA.
Reference
https://youtu.be/JlmJ5PEmIOo
https://en.wikipedia.org/wiki/Principal_component_analysis
https://www.dezyre.com/data-science-in-python-tutorial/principal-component-analysis-tutorial
https://www.geeksforgeeks.org/ml-principal-component-analysispca/
https://www.researchgate.net/post/How-can-PCA-reduce-the-size-of-the-feature-vector-and-eliminate-the-redundant-features
https://stats.stackexchange.com/questions/22569/pca-and-proportion-of-variance-explained
https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues/140579#140579?newreg=61a6aab73cbe4720b00fbf52bd8b9afd
https://www.youtube.com/watch?v=6Pv2txQVhxA
https://www.qlucore.com/news/the-benefits-of-principal-component-analysis-pca
https://medium.com/analytics-vidhya/merging-principal-component-analysis-pca-with-artificial-neural-networks-1ea6dad2c095
https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.09-Principal-Component-Analysis.ipynb#scrollTo=Ao1YFn0OcBD8
https://stats.stackexchange.com/questions/247260/principal-component-analysis-eliminate-noise-in-the-data
https://medium.com/@dareyadewumi650/understanding-the-role-of-eigenvectors-and-eigenvalues-in-pca-dimensionality-reduction-10186dad0c5c
https://sebastianraschka.com/Articles/2015_pca_in_3_steps.html
https://youtu.be/YSqFB7Srx-4
https://youtu.be/YSqFB7Srx-4?t=1082
https://math.stackexchange.com/questions/2026480/covariance-matrix-with-complex-eigenvalues
https://youtu.be/YSqFB7Srx-4?t=2205
https://coursera.org/share/45a54686085922d1458cdc86d8a086eb
https://stackoverflow.com/questions/24729447/what-does-it-mean-to-have-zero-mean-in-the-data
https://www.amazon.in/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1491962291