Justify the model and the data using earning curve analysis

Post date: Aug 15, 2012 12:56:56 AM

In order to say whether or not our model and the number of examples get along well, i.e. the model is not overfitting or underfitting the data, we might need to analyze the learning (error) curves of cross-validation (cv) and training set.

If it appears to have a big gap between Error_{cv} and Error_{train}, then we say it's "overfitting".

If small gap but high error between the two error curves, then we call the model is "underfitting".

Now, also consider the curves error vs #features (d). The curve would tell you whether or not your choice of d is optimal.

It would be interesting to plot d, m and error on the same curve--unified point of view