A linear model is a model where the relationship between the features (predictors, inputs, etc.) and the label (output) is expressed in a linear form. Here β is a set of parameters and b the intercept that needs to be estimated. Likewise for polynomial regression, we use a polynomial to express the relation between features (predictors), and the label y.
For instance, one can consider the celebrated Fama-French three-factor model, where the return of any asset can be expressed in terms of three factors, namely, excess market risk, small market capitalization minus big market capitalization (SMB), and high book to market ratio minus low book to market (HML). they measure the historic excess returns of small caps over big caps and of value stocks over growth stocks. The link to the data is given as follows in the French library.
•r: is the risk-free rate, R: Market risk
•SMB stands for "Small [market capitalization] Minus Big“
•HML for "High [book-to-market ratio] Minus Low
http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html
Now we have to do the estimation in order to set the model. In the ML terminology, instead of ‘estimating’ we call ‘training’. As we have said before, the value of the parameters β and b are not important to us, but the prediction that this model does. So we have two major tasks to do. First, we have to make sure the training is done correctly, and the other objective is to make sure if the trained model (or fitted model) can well predict a data set that the model has not met. The method that is used is called the minimum least square, where we try to find β and b so that the least square of the error, that is the difference between the real value y and the trained model β^T x + b is minimized. This is done on the data that the model can see (called the training data).
In the next step, we have to examine if the obtained model can also well perform on the data that is not seen when the model is trained. As you can see the model is trained on the blue dots, and now we present it to the red dots, the data that is not in the first place seen in the training stage. This data is called the validation set. If the model can do well on predicting the relation between x (feature) and y the label then we can approve the model.
Another important subject in ML is regularization. As it has been pointed out, ML is the statistics of many dimensions. The model complexity that we associate with high dimension increases the model capacity, simply since it covers the simpler models and goes beyond. But at the same time having high capacity can increase the chance of higher variance, overfitting, computational errors, etc. For that reason, while increasing model capacity tends to be a major practice we do in ML, we also need to control it. That means we need to ‘regularize’ the models and control their capacity. Any practice that can control the model capacity is called regularization. Regularization can happen in many different ways, which we try to discuss in this course. But as the first example, one can consider two of them on linear regression, where the absolute value of the parameters is penalized. One can either include the absolute value or, the squared of the parameters in the error function. This means we do not easily let the value of the parameters increase.