You can transform a linear regression numeric estimate into a probability that is more apt to describe how a class fits an observation:
probability of a class = exp(r) / (1+exp(r))
r is the regression result. A linear regression using such a formula for transforming its results into probabilities is a logistic regression.
Logistic regression is similar to linear regression, with the only difference being the y data, which should contain integer values indicating the class relative to the observation.
To make the example easier to work with, leave a single value out so that later you can use this value to test the efficacy of the logistic regression model on it.
In contrast to linear regression, logistic regression doesn't just output the resulting class (in this case, the class 2) but also estimates the probability of the observation's being part of all three classes. Based on the observation used for prediction, logistic regression estimates a probability of 75.4% of its being from class 2--a high probability, but not a perfect score, therefore leaving a margin of uncertainty.
SVD on Homes Database
Using homes.csv, try to do the following:
Set the matrix A to be all the columns in homes. (You can use .values to make it numpy array). Then print it.
Perform SVD on matrix A. Then print out the matrix U, s, and Vh.
Try to delete the last 3 columns of matrix U. Adjust s and Vh accordingly. Then try to multiply all of them and see the difference with the original homes table.