3. Linear Regression

In regression tasks, the target value is continuous (numerical)
Creating feature and target
- Scikit-learn ‘features’ and ‘target’ values in distinct arrays, X and y
Regression mechanics
- y = ax + b
- y = target
- x = single feature
- a, b = parameters of model
Linear regression in higher dimensions
- y = a1x1 + a2x2 + a3x3 + anxn + b (more than one or two features)

The line to be as close to the actual data points as possible
To minimize the vertical distance between the fit and the data
For each data point, calculate the vertical distance between point and the line -> Residual
To minimize the sum of the residuals

from sklearn.linear_model import LinearRegression # Import LinearRegression

reg = LinearRegression() # Create the regressor

prediction_space = np.linspace(min(X_fertility), max(X_fertility)).reshape(-1,1) # Create the prediction space

reg.fit(X_fertility, y) # Fit the model to the data

y_pred = reg.predict(prediction_space) # Compute predictions over the prediction space: y_pred

print(reg.score(X_fertility, y)) # Print R^2

plt.plot(prediction_space, y_pred, color='black', linewidth=3) # Plot regression line

plt.show()

from sklearn.linear_model import LinearRegression

from sklearn.model_selection import cross_val_score

reg = LinearRegression() # Create a linear regression object

cv_scores = cross_val_score(reg, X, y, cv=5) # Compute 5-fold cross-validation scores

print(cv_scores)

print("Average 5-Fold CV Score: {}".format(np.mean(cv_scores)))

cvscores_3 = cross_val_score(reg, X, y, cv=3) # Perform 3-fold CV

print(np.mean(cvscores_3))

cvscores_10 = cross_val_score(reg, X, y, cv=10) # Perform 10-fold CV

print(np.mean(cvscores_10))

Google Sites

Report abuse