Correlation means to find out the association between the two variables and Correlation coefficients are used to find out how strong the is relationship between the two variables.
The most popular correlation coefficient is Pearson’s Correlation Coefficient. It is very commonly used in linear regression.
Pearson correlation measures the linear relationship between variable continuous X and variable continuous Y and has a value between 1 and -1. It attempts to draw a line of best fit through the spread of two variables. Hence, it specifies how far away all these data points are from the line of best fit.
Both variables X and Y should satisfy below properties
Data should be derived from random or least representative samples, draw a meaningful statistical inference.
Both variables should be continuous and normally distributed.
There should be Homoscedasticity, which means the variance around the line of best fit should be similar.
Extreme outliers influence the Pearson Correlation Coefficient.
PCC vs LASSO
??
PCC vs PCA
??
Role in machine learning
Many machine learning algorithms require that the continuous variables are not correlated with each other, a phenomenon called multicollinearity. This is because multicollinearity adversely impacts the model training process(verify what it mean, https://www.pluralsight.com/guides/estimate-correlation-coefficient-in-azure-machine-learning-studio)
Use in neural networks
https://youtu.be/6fUYt1alA1U
https://www.analyticsvidhya.com/blog/2021/01/beginners-guide-to-pearsons-correlation-coefficient/#:~:text=Correlation%20means%20to%20find%20out,commonly%20used%20in%20linear%20regression.
https://towardsdatascience.com/what-it-takes-to-be-correlated-ce41ad0d8d7f
https://www.pluralsight.com/guides/estimate-correlation-coefficient-in-azure-machine-learning-studio