Performance is compared based on different experiments:
For our predictions, we decided to go with 5 different models and algorithms:
Decision Tree Classifier: Since we've already decided to split our data into separate smaller segments, it seems fit that decision tree will be perfect for our prediction. Moreover, it's a reliable, fast and easy to implement solution.
Random Forest Classifier: Random Forest Classifier is more robust compared to decision tree, such that it does not suffer the instability problems of decision trees. In addition, it has better accuracy than other classification algorithms and lower risk of overfitting.
MLP Classifier: Most of our features are labeled and classified, which is exactly what MLP covers.
KNN Classifier: After applying K-mode algorithm for our descriptive analysis, the data appears to be similar and repetitive. So, knowing that KNN algorithm uses 'feature similarity' to predict the values of any data points, it was chosen for that specific purpose.
SVC Classifier: The Linear Support Vector Classifier (SVC) method applies a linear kernel function to perform classification, and it performs well with numerous samples
Naive Bayes: a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. This was originally used for our "POST FEATURE SELECTION" period to prove that our feature selection was decent.