Initially, a simple linear SVM model was used after standardizing the data, as outlined in the Algorithm below to test the entire workflow.
In the final computational pipeline of Super. Complex, the training dataset is run through an Auto-ML algorithm, tpot (Olson et al.) that evaluates several preprocessors and machine learning models and yields cross validation (CV) scores on the training dataset for each pipeline (combination of several preprocessors and the machine learning model).
The preprocessors and the ML models evaluated in our experiments are some of the most commonly used ones in the sklearn ML library in Python and are also listed below:
Motivation behind listing these is to provide a short summary of the most common ML techniques.
The pipelines with high cross-validation (CV) scores are evaluated on the test dataset to find the best pipeline for our data for use later in the sampling stage for prediction.