PCA

Principle Component Analysis PCA


Give a set of multi-dimensional points, firstly it calculates the middle position of all data points, and shift all data points accordingly to use the middle point as the new origin point. The middle in x axis is simply the average of all points' x coordinates.


Secondly, it draws a line through the origin point, and find the line the best fits the data. By best fitting all the data points, the square error is minimized. simple. The first line is call principal component 1, PC1. It also needs to scale the line to make it a unit vector with length = 1, so the data points need to scale accordingly as well.


Then it draws a second line through the origin and the second line must be orthogonal/perpendicular to the first one. Again, it rotates the line within the orthogonal space to find the best fit position to minimize the square error of all data points. This line is PC2.


This process keeps going with the 3rd line, the 4th line, etc. The ith line must be orthogonal to all of the previous lines. All lines need to be scaled to a unit vector so they are comparable numerically.


When the lines (principal components) are determined (depending on how many you want), we can calculate all data points' distances (actually square distances) to the origin point along the direction of each principal component. The sum of the square distances measures the variances along a principal component. The variance is also called the eigenvalue. The square root of the variance is called the singular value.


Then we can plot a bar chart of all principal components' variances/eigenvalues. A big variance means the principal component is significant in separating / distinguishing the data. PC1 has the biggest variance, then PC2, PC3, etc. A nice PCA would have variances drop very quickly component by component. A not so nice PCA would have variances drop slowly which mean the data set is harder to be separated using PCA.


Finally, one can keep only the top k most significant components to represent the data, achieving dimension reduction. For visualisation purpose, e.g. drawing a 3d dataset on a 2d plane, we can use only the first two components which has carried the most separating power.