A dataset y = [y1, y2, ... yn]
A prediction set f = [f1, f2, ... fn]
Now we want to check how well the prediction f fits the original dataset y.
first, define the residual e = y - f
so [e1, e2, ... en] = [y1-f1, y2-f2, ... yn-fn]
The sum of squares of residuals SUM(e^2) = e1^2 + e2^2 +...ek^2 describes the variance between y and f
secondly, define the variance v = y - mean(y)
so [v1, v2 ... vn] = [y1-y_mean, y2-y_mean, ... yn1-y_mean]
The sum of squares of the variances SUM(v^2) = v1^2 + v2^2 + ... vn^2 describes the variance between y and the mean of y
R2 = 1 - SUM(e^2) / SUM(v^2)
It compares the squares of errors (y-f) to the squares of variances (y - mean)
If the prediction f fits y well, then the errors are small, then SUM(e^2) / SUM(v^2) tends to be small, and R2 tends to be 1
If the prediction f is random guess, the error follows some normal distribution centered at the mean of y, and the errors (y-f)
would be similar to the variances of y (y- mean of y), therefore SUM(e^2) / SUM(v^2) tends to be 1, and R2 tends to be 0.
If someone makes a really bad prediction, so bad that it is even worse than random guess, the errors would be bigger than the
variances of y, e.g. many non-sense extremely large values, SUM(e^2) / SUM(v^2) tends to be > 1, so R2 tends to be negative