Backward feature elimination and forward feature construction is performed by using self-developed code CIG_best_features.py. In backward feature elimination, all features are included in model in the first iterations. In each round, one feature with least importance to the performance of the model is removed. In contrast, in forward feature construction, there is no feature in the model in the beginning. In each iteration, feature with the best performance in combination of all existing features in the model will be added. The performance is measured by closest distance of ROC curve to left up corner.
Logistic regression model is built with sklearn logistic regression library. And data is centralized and normalized by using preprocessing library in sklearn. ROC curves are created by self-developed code by using sklearn in python library and the final curve is the average of 100 times cross validation on the test datasets。 Penalty is set as l1 to prevent the overfitting in all experiments. P values is calculated by following Hanley’s method 4.