Cross-validation combines (averages) measures of fitness in prediction to derive a more accurate estimate of machine learning model prediction performance. Without this, the model learns the training data, however it will not perform well on unknown data of real world. If you like to understand this behaviour, then this document helps.
The goal of cross-validation is to test the model's ability to predict new data that was not used in estimating it, in order to flag problems like overfitting or selection bias and to give an insight on how the model will generalise to an independent dataset (i.e., an unknown dataset, for instance from a real problem).
Our main objective is that the model should be able to work well on the real-world data, although the training dataset is also real-world data, it represents a small set of all the possible data points(examples) out there. To know the real score of the model, it should be tested on the data that it has never seen before.
Cross-validation is a technique for evaluating ML models by training several ML models on subsets of the available input data and evaluating them on the complementary subset of the data.
In k-fold cross-validation, you split the input data into k subsets of data (also known as folds). You train an ML model on all but one (k-1) of the subsets, and then evaluate the model on the subset that was not used for training. This process is repeated k times, with a different subset reserved for evaluation (and excluded from training) each time.
This method is exhaustive. It trains and tests on all possible combinations. So, it can become computationally expensive for large values of p.
Stratified K-fold maintains the class proportions by splitting the data set in such a way that they contain approximately the same proportions of labels as in the original data set.
This strategy guarantees that when the data set is unbalanced, one class of the data is not over-represented.
As a rule, the test set should never be used to change your model (e.g., its hyperparameters).
In the simplest scenario, one would collect one dataset and train your model via cross-validation to create your best model. Then you would collect another completely independent dataset and test your model.
If you have smaller data set, then you can't afford test data and so, the validation is performed on every fold and your validation metric would be aggregated across each validation.
When we have very little data, splitting it into training and test set might leave us with a very small test set. We can get almost any performance on this set only due to chance.If we use cross-validation in this case, we build K different models, so we are able to make predictions on all of our data.
Sometimes we want to (or have to) build a pipeline of models to solve something. The critical part here is that our second model must learn on the predictions of our first model. We can’t train both our models on the same dataset because then, our second model learns on predictions that our first model already seen. By using cross-validation, we can evaluate both models on different datasets
Most of the learning algorithms require some (hyper)parameters tuning. It could be the number of trees in Gradient Boosting classifier, hidden layer size or activation functions in a Neural Network, type of kernel in an SVM and many more. There are many methods to do this. It could be a manual search, a grid search or some more sophisticated optimization. However, in all those cases we can’t do it on our training test and not on our test set of course. We have to use a third set, a validation set.
Handles overfit problem due to test data
Enables effective training with little dataset
It is used for hyperparameter tuning
In ensemble learning, cross-validation is used to train various models
Refer here for colab example with training, validation and test set.
Cross-validation is essentially a practical and reliable technique to gauge the quality of a particular neural network. Knowing the quality of a neural network allows you to identify when over-fitting has occurred.
When applied to several neural networks with different free hyper-parameter values (such as the number of hidden nodes, back-propagation learning rate, and so on), the results of cross-validation can be used to select the best set of parameter values.
There are various reasons for overfit. Above mentioned is one of them only
https://en.wikipedia.org/wiki/Cross-validation_(statistics)
https://docs.aws.amazon.com/machine-learning/latest/dg/cross-validation.html
https://www.geeksforgeeks.org/cross-validation-machine-learning/
https://www.mygreatlearning.com/blog/cross-validation/
https://images.app.goo.gl/DDZRzmbEdEGpud5h8
https://towardsdatascience.com/5-reasons-why-you-should-use-cross-validation-in-your-data-science-project-8163311a1e79
https://images.app.goo.gl/BER935pexXtez8ep6
https://towardsdatascience.com/cross-validation-in-machine-learning-72924a69872f
https://towardsdatascience.com/understanding-8-types-of-cross-validation-80c935a4976d
https://images.app.goo.gl/U1gZzUyUaTCDqGwU7
https://aiaspirant.com/cross-validation/
https://images.app.goo.gl/eQJh54HAzimL5GeV7
https://images.app.goo.gl/6RnwF5APkgiRJC2E9
https://stats.stackexchange.com/questions/148688/cross-validation-with-test-data-set
https://visualstudiomagazine.com/articles/2013/10/01/understanding-and-using-kfold.aspx
https://machinelearningmastery.com/how-to-create-a-random-split-cross-validation-and-bagging-ensemble-for-deep-learning-in-keras/
https://www.researchgate.net/post/Is_cross_validation_necessary_in_neural_network_training_and_testing
https://youtu.be/MyBSkmUeIEs
https://colab.research.google.com/drive/1_J2MrBSvsJfOcVmYAN2-WSp36BtsFZCa?usp=sharing