This is a competition from the website Drivendata.org (https://www.drivendata.org/competitions/66/flu-shot-learning/page/210/)
The competition wants you to predict how likely individuals are to receive their H1N1 and seasonal flu vaccines based on 35 features per person.
The problem has 2 targets:
Whether respondent received H1N1 flu vaccine
Whether respondent received seasonal flu vaccine
Some respondents didn't get either vaccine, others got only one, and some got both. This is formulated as a multilabel (and not multiclass) problem.
The core model used is the XGBoost, but a vertical ensamble with DNN, RandomForest and GradientBoostingClassifier has been considered as well.
Here below you can find the table with the accuracy score for all the modes considered:
All of them behaive similarly and there is no much advantage of using the ensamble though.
XGBoost is the best model apparently and the ROC curve is quite good as well:
Hyperparameter tuning and Feature selection has been performed on the XGBoost model, but the resutl is comparable with the one reached here above.
My result has been submited at the competition:
Ranked 66 over 1000 participant (AUROC 0.8607)
The code can be found in my github repository: