Titanic Prediction Using Ensamble Learning

15/07/2018

Introduction

This Kaggle's competition was my very first machine learning approach.

The objective was "to apply the tools of machine learning to predict which passengers survived the tragedy" a full description is available in kaggle.

To solve this problem I performed feature analysis using Tableau. Visualizations of some basic ad-hoc A/B tests were used to find the more statistically significant features.

For the final model and ensamble learning hard voting classifier is built using tree-based algorithms:

  • XGBoost
  • Random Forest
  • Support Vector Machines

Notebook

Conclutions

The results shows that ticket class, sex, age, fare, port of embarkation, family size and title were the features with the greatest impact in the survival rate.

Also, it was possible to predict if a passanger will survive or not with an accuracy of more than 80% and a standard deviation of less than 3%.

Skills used in the final solution:

  • Feature engineering (missing values, feature extraction)
  • Ensamble learning (hard voting classifier implementation)
  • Modeling and hyperparameter tuning (XGBoost, Random Forest, Support Vector Machines)
  • Cross-validation test implementation
  • Tableau