Different performance metrics are used to evaluate different machine learning algorithms. Since we are focusing on the ones used for classification problems, thus we can apply classification performance metrics such as accuracy and precision.
There is no bias because the difference between precision and recall is not significant.
Table below shows the percentage of accuracy and precision before and after tuning:
Bar chart: Comparison accuracy before and after tuning for Decision Tree
Based on the bar chart, the accuracy for each of the ratios increase after performing hyperparameter tuning. 70:30 ratios of training model has the highest accuracy in Rapidminer which is 70.56%. However, 50:50 ratios of training model has the highest accuracy in Python which is 67.16%
Bar chart: Comparison accuracy before and after tuning for Random Forest
Based on the bar chart, the accuracy for each of the ratios increase after performing hyperparameter tuning and random forest model in Rapidminer increase significantly than in Python. 70:30 ratios of training model has the highest accuracy in Rapidminer which is 74.31% whereas the highest accuracy in Python is 66.67%
Result Analysis:
Best Model Accuracy score before tuning:
Rapidminer - Random Forest (70.4%)
Python- Random Forest (66.23%)
Best Model Accuracy score after tuning:
Rapidminer - Random Forest (74.31%)
Python- Decision Tree (67.16%)
In terms of accuracy score before tuning, all 70:30 ratios of training models are able to score higher than other ratios which are 50:50 ratio and 30:70 ratio. This is because the amount of training sets a big role in increasing the accuracy score, as the training data is sufficiently large, then accuracy score will be higher. By using hyperparameter tuning , it increases every models' accuracy score and precision score. Decision Tree in Python with 50:50 ratio with 67.16% accuracy model is use for deployment.
Conclusion
Few insights can be obtain using data visualization. To get a better performance model, we need to do data pre processing again and again. By using machine learning model, we can predict whether the flight will be on time or delay base on several attribute like the service type, destination of flight and the others. We able to perform hyperparameter tuning to get a better accuracy by improve 1 to 4 percent. This predictive model will be useful for our stakeholders to efficiently predict flight delay to avoid bad impact to the company and able to take precautions steps to avoid flight delay.