When facing the huge number of different ML algorithms, the most frequent question is: “Which algorithm is the right solution for the given problem?”. The answer to this question varies depending on many factors, including 1) the size, quality, and nature of the domain data; 2) the available computational time; 3) the urgency of the task and 4) what is the aim of the quest. In many cases, no one can tell which algorithm will perform the best before trying different algorithms after thoughtful data examination. The use of a concrete algorithm is usually chosen based on data characteristics and exploratory data analysis. As in general with DM using ML approach, the performance of data models is strongly dependent on the representativeness of the provided data set. The complementarity of methods leads to try different options from a wide spectrum of available modelling methods based on data characteristics and analysis. In order to reach the maximum performance, in many cases, it is necessary to train each model multiple times with different parameters and options (so-called model ensembling). Sometimes, it is also suitable to combine several independent models of different types, because each type can be strong in fitting different cases. The full potential of the data can be tapped by a cooperation of partial weak models e.g. using ensemble learning methods based on principles such as voting, record weighting, multiple training process or random selection. Hence, a proper combination of several types of models with different advantages and disadvantages can be used to reach the maximum accuracy and stability in predictions.
The simplest customary way is to categorize ML algorithms into supervised, unsupervised and semi-supervised learning [Goodfellow 2016] as follows.
It is interesting to notice that ML algorithms have no strict categorization, e.g. some method can be listed in one or more categories. For example, NNs can be trained for some problems in a supervised manner while in other problems in an unsupervised manner. Although the problem of algorithm categorization is interesting, it is out of the scope of this document.
Pre-processing and post-processing algorithms can also be categorized into a number of subcategories such as dimensionality reduction, sampling (subsampling, oversampling), linear methods, statistical testing, feature engineering with feature extraction, feature encoding, feature transformation and feature selection (e.g. mutual information, chi-square X2 statistics). Many more algorithms can be listed here for overfitting prevention (e.g. regularization, threshold setting, pruning, dropout), model selection and performance optimization (e.g. hyper-parameter tuning, grid search, local minimum search, bio-inspired optimization) and model evaluation (e.g. crossvalidation, k-fold, holdout) with various metrics such as accuracy (ACC), precision, recall, F1, Matthews correlation coefficient (MCC), receiver operating characteristic (ROC), area under the curve (ROC AUC), mean absolute error (MAE), mean squared error (MSE), and root-mean-square error (RMSE).
Fig. 3 provides a comprehensive graphical overview of ML methods for modelling as well as for pre-processing and post-processing. However, this overview is the subject to change as the number of ML algorithms is increasing continually.