GOAL : Implement scikit-learn function about tree and forests.
Learning experience: This assignment taught me how to use decision trees and random forests to build models and evaluate their performance, including methods like Out-of-Bag (OOB) error estimation. I also learned how to handle different features and improve model performance using boosting techniques like XGBoost and LightGBM. Overall, this assignment enriched my knowledge in machine learning and will be beneficial for my future learning and work.
working environment :
OS: Windows 11 home
CPU : intel i9-13900k
GPU : Nvidia RTX 4090
Python Version : 3.12.2
Development environment: jupyter notebook.
14.0 Introduction
Tree-based learning algorithms are a broad and popular family of related non-parametric, supervised methods for both classification and regression. The basis of tree-based learners is the decision tree, wherein a series of decision rules (e.g., “If a person’s credit score is greater than 720…”) are chained. The result looks vaguely like an upside-down tree, with the first decision rule at the top and subsequent decision rules spreading out below. In a decision tree, every decision rule occurs at a decision node, with the rule creating branches leading to new nodes. A branch without a decision rule at the end is called a leaf.
One reason for the popularity of tree-based models is their interpretability. In fact, decision trees can literally be drawn out in their complete form (see Recipe 14.3) to create a highly intuitive model. From this basic tree system comes a wide variety of extensions from random forests to stacking. In this chapter we will cover how to train, handle, adjust, visualize, and evaluate a number of tree-based models.
14.1 Training a Decision Tree Classifier
This section will train a classifier using a decision tree.
class sklearn.tree.DecisionTreeClassifier(*, criterion='gini', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, class_weight=None, ccp_alpha=0.0, monotonic_cst=None) , A decision tree classifier. [2]
In 1 utilizes the scikit-learn library to create a Decision Tree classifier model and train it on the Iris dataset.
Firstly, the necessary libraries are imported. This includes importing the DecisionTreeClassifier class from Scikit-learn and loading the Iris dataset from the datasets module.
Next, the dataset is loaded and divided into two parts: features and target. Features contain four measurements of iris flowers, while the target represents the corresponding category for each sample.
Then, an object of DecisionTreeClassifier is created and named decisiontree. In this example, no parameters are set, so default values are used, such as using Gini impurity as the criterion for splitting.
Finally, the fit() method is used to train the model with the data. The parameters passed to fit() are features and target.
In scikit-learn, DecisionTreeClassifier operates like other learning methods; after the model is trained using fit, we can use the model to predict the class of an observation , show in In 2 we can see that firstly, a new observation observation is created, which is a list containing four feature values representing the measurement data of an iris flower.
Then, the predict() method of the model is used to predict the class of this observation. The predict() method returns an array containing the predicted result, where each element represents the class to which the new observation belongs. and it belongs to array[1].
We can also see the predicted class probabilities of the observation In 3.
In 4 , if we want to use a different impurity measurement we can use the criterion parameter, Firstly, a new instance of the DecisionTreeClassifier class is created, named decisiontree_entropy. The parameter criterion='entropy' specifies that entropy should be used as the criterion for making decisions at each node of the tree. Entropy is a measure of impurity in a set of samples.
Then, the model is trained using the fit() method with the features and target data. The trained model is stored in the variable model_entropy.
In 5 6, we can see that the predict doesn't change.
14.2 Training a Decision Tree Regressor
This section will train a regression model using a decision tree.
class sklearn.tree.DecisionTreeRegressor(*, criterion='squared_error', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, ccp_alpha=0.0, monotonic_cst=None), A decision tree regressor. [3]
In 7 build a decision tree regressor model using Scikit-learn library and train it using the diabetes dataset.
Firstly, the necessary libraries are imported. This includes importing the DecisionTreeRegressor class from Scikit-learn and loading the diabetes dataset from the datasets module.
Next, the diabetes dataset is loaded, which contains only two features. The dataset is divided into two parts: features and target. Features contain two measurements for diabetic patients, while the target represents the quantitative measure of disease progression for each sample.
Then, an object of DecisionTreeRegressor is created and named decisiontree. In this example, no parameters are set, so default values are used.
Finally, the fit() method is used to train the model with the data. The parameters passed to fit() are features and target.
Once we have trained a decision tree, we can use it to predict the target value for an observation, show in In 8 Firstly, a new observation observation is created, which contains the features of the first sample from the diabetes dataset.
Then, the predict() method of the model is used to predict the value of this observation. The predict() method returns the predicted value based on the features of the observation.
In 9 we can use the criterion parameter to select the desired measurement of split quality. For example, we can construct a tree whose splits reduce mean absolute error (MAE).
Firstly, a new instance of the DecisionTreeRegressor class is created, named decisiontree_mae. The parameter criterion="absolute_error" specifies that MAE should be used as the criterion for making decisions at each node of the tree. MAE is a measure of the average absolute deviation of the predictions from the actual values.
Then, the model is trained using the fit() method with the features and target data. The trained model is stored in the variable model_mae.
'DecisionTreeRegressor' object has no attribute 'predict_proba' .
14.3 Visualizing a Decision Tree Model
This section will visualize a model created by a decision tree learning algorithm.
sklearn.tree.export_graphviz(decision_tree, out_file=None, *, max_depth=None, feature_names=None, class_names=None, label='all', filled=False, leaves_parallel=False, impurity=True, node_ids=False, proportion=False, rotate=False, rounded=False, special_characters=False, precision=3, fontname='helvetica'), Export a decision tree in DOT format. [4]
In 3 visualizes the decision tree classifier model using the Iris dataset.
Firstly, the necessary libraries are imported. This includes importing pydotplus for generating the graph, DecisionTreeClassifier class from Scikit-learn, datasets from Scikit-learn to load the Iris dataset, Image from IPython.display to display the image inline, and tree from Scikit-learn for exporting the graph data.
Then, the Iris dataset is loaded and divided into features and target variables.
Next, a decision tree classifier object decisiontree is created and trained using the fit() method with the features and target data.
After that, the DOT data for the decision tree is created using the export_graphviz() function from the tree module, specifying the decision tree classifier, feature names, and class names.
Subsequently, the DOT data is used to create a graph using pydotplus.graph_from_dot_data() function.
Finally, the graph is displayed using Image() from IPython.display, showing the decision tree visualization.
If we look at the root node, we can see the decision rule is that if petal widths are less than or equal to 0.8 cm, then go to the left branch; if not, go to the right branch. We can also see the Gini impurity index (0.667), the number of observations (150), the number of observations in each class ([50,50,50]), and the class the observations would be predicted to be if we stopped at that node (setosa). We can also see that at that node the learner found that a single decision rule (petal width (cm) <= 0.8) was able to perfectly identify all of the setosa class observations. Furthermore, with one more decision rule with the same feature (petal width (cm) <= 1.75) the decision tree is able to correctly classify 144 of 150 observations. This makes petal width a very important feature!
In 4 5, we can easily export the visualization into PDF or a PNG image.
In 9 we can just as easily be used to visualize a decision tree regressor. The output is show in below, more info in this png.
Note : No module named 'pydotplus' , pip install pydotplus.
GraphViz's executables not found , solution
14.4 Training a Random Forest Classifier
This section will train a classification model using a “forest” of randomized decision trees.
class sklearn.ensemble.RandomForestClassifier(n_estimators=100, *, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='sqrt', max_leaf_nodes=None, min_impurity_decrease=0.0, bootstrap=True, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False, class_weight=None, ccp_alpha=0.0, max_samples=None, monotonic_cst=None), A random forest classifier. [5]
In 10 demonstrates how to create and train a random forest classifier model using the Iris dataset.
Firstly, the necessary libraries are imported. This includes importing the RandomForestClassifier class from Scikit-learn and loading the Iris dataset from the datasets module.
Next, the Iris dataset is loaded, which includes both features and target variables.
Then, a random forest classifier object randomforest is created. Parameters such as random_state=0 ensure reproducibility of results, while n_jobs=-1 utilizes all available CPU cores for parallel processing during training.
After that, the model is trained using the fit() method with the features and target data.
A common problem with decision trees is that they tend to fit the training data too closely (i.e., overfitting).This has motivated the widespread use of an ensemble learning method called random forest. In a random forest, many decision trees are trained, but each tree receives only a bootstrapped sample of observations (i.e., a random sample of observations with replacement that matches the original number of observations), and each node considers only a subset of features when determining the best split. This forest of randomized decision trees (hence the name) votes to determine the predicted class.
In 11 as we can see by comparing this solution to Recipe 14.1, scikit-learn’s RandomForestClassifier works similarly to DecisionTreeClassifier.
In 12 RandomForestClassifier also uses many of the same parameters as DecisionTreeClassifier. For example, we can change the measure of split quality used.
However, being a forest rather than an individual decision tree, RandomForestClassifier has certain parameters that are either unique to random forests or particularly important. First, the max_features parameter determines the maximum number of features to be considered at each node and takes a number of arguments including integers (number of features), floats (percentage of features), and sqrt (square root of the number of features). By default, max_features is set to auto, which acts the same as sqrt. Second, the bootstrap parameter allows us to set whether the subset of observations considered for a tree is created using sampling with replacement (the default setting) or without replacement. Third, n_estimators sets the number of decision trees to include in the forest. Finally, while not specific to random forest classifiers, because we are effectively training many decision tree models, it is often useful to use all available cores by setting n_jobs=-1.
In 19 can plot the forest, the output has 100 of decision tree, the below show the 95 and 99.
14.5 Training a Random Forest Regressor
This section will train a regression model using a “forest” of randomized decision trees.
class sklearn.ensemble.RandomForestRegressor(n_estimators=100, *, criterion='squared_error', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=1.0, max_leaf_nodes=None, min_impurity_decrease=0.0, bootstrap=True, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False, ccp_alpha=0.0, max_samples=None, monotonic_cst=None), A random forest regressor. [6]
In 14 demonstrates how to create and train a random forest regressor model using the diabetes dataset with only two features.
Firstly, the necessary libraries are imported. This includes importing the RandomForestRegressor class from Scikit-learn and loading the diabetes dataset from the datasets module.
Next, the diabetes dataset is loaded, containing only two features, and the corresponding target variable.
Then, a random forest regressor object randomforest is created. Parameters such as random_state=0 ensure reproducibility of results, while n_jobs=-1 utilizes all available CPU cores for parallel processing during training.
After that, the model is trained using the fit() method with the features and target data.
Just as we can make a forest of decision tree classifiers, we can make a forest of decision tree regressors, where each tree uses a bootstrapped subset of observations and at each node the decision rule considers only a subset of features. As with RandomForestClassifier we have certain important parameters:max_features:Sets the maximum number of features to consider at each node. Defaults to p features, where p is the total number of features.
bootstrap:Sets whether or not to sample with replacement. Defaults to True.
n_estimators:Sets the number of decision trees to construct. Defaults to 10.
In 20 can plot the forest, the output has 100 of decision tree, the below show the 80 and 98.
14.6 Evaluating Random Forests with Out-of-Bag Errors
This section will evaluate a random forest model without using cross-validation.
In 21 trains a Random Forest Classifier on the Iris dataset and then assesses the out-of-bag error rate.
Firstly, the necessary libraries are imported, including the RandomForestClassifier class from Scikit-learn and the datasets module to load the Iris dataset.
Next, the Iris dataset is loaded, consisting of features and target variables.
Then, a Random Forest Classifier object randomforest is created. In this example, several parameters are set:
random_state=0: Ensures reproducibility of results.
n_estimators=1000: Specifies the number of trees in the forest as 1000.
oob_score=True: Enables calculation of the out-of-bag error rate.
n_jobs=-1: Utilizes all available CPU cores for parallel processing during training.
The model is then trained using the fit() method with the features and target data.
Finally, the out-of-bag error rate is assessed using the randomforest.oob_score_ attribute.
In random forests, each decision tree is trained using a bootstrapped subset of observations. This means that for every tree there is a separate subset of observations not being used to train that tree. These are called out-of-bag (OOB) observations. We can use OOB observations as a test set to evaluate the performance of our random forest.
For every observation, the learning algorithm compares the observation’s true value with the prediction from a subset of trees not trained using that observation. The overall score is calculated and provides a single measure of a random forest’s performance. OOB score estimation is an alternative to cross-validation.
14.7 Identifying Important Features in Random Forests
This section will know which features are most important in a random forest model.
Feature importances with a forest of trees [7]
In 22 visualizes the feature importances of each feature in the Iris dataset using a bar plot.
First, the necessary libraries are imported, including numpy for numerical computations and matplotlib.pyplot for plotting.
Then, the Iris dataset is loaded, consisting of features and target variables.
A Random Forest Classifier object randomforest is created with specified parameters such as random_state=0 for reproducibility and n_jobs=-1 for utilizing all available CPU cores during training.
The model is trained using the fit() method with the features and target data.
Next, the feature importances are calculated using the feature_importances_ attribute of the trained model.
The feature importances are sorted in descending order, and the indices of the sorted importances are stored.
The feature names are rearranged to match the sorted feature importances.
A bar plot is then created, where the x-axis represents the feature indices, the y-axis represents the feature importances, and the feature names are added as x-axis labels.
Finally, the plot is displayed using plt.show(). The output shows that petal's length and petal width is more important.
One of the major benefits of decision trees is interpretability.Specifically, we can visualize the entire model (see Recipe 14.3). However, a random forest model is composed of tens, hundreds, or even thousands of decision trees. This makes a simple, intuitive visualization of a random forest model impractical. That said, there is another option: we can compare (and visualize) the relative importance of each feature.
In Recipe 14.3, we visualized a decision tree classifier model and saw that decision rules based only on petal width were able to classify many observations correctly. Intuitively, we can say this means that petal width is an important feature in our classifier. More formally, features with splits that have the greater mean decrease in impurity (e.g., Gini impurity or entropy in classifiers and variance in regressors) are considered more important.
However, there are two things to keep in mind regarding feature importance. First, scikit-learn requires that we break up nominal categorical features into multiple binary features. This has the effect of spreading the importance of that feature across all of the binary features and can make each feature appear to be unimportant even when the original nominal categorical feature is highly important. Second, if two features are highly correlated, one feature will claim much of the importance, making the other feature appear to be far less important, which has implications for interpretation if not considered.
14.8 Selecting Important Features in Random Forests
This section will conduct feature selection on a random forest.
class sklearn.feature_selection.SelectFromModel(estimator, *, threshold=None, prefit=False, norm_order=1, max_features=None, importance_getter='auto'), Meta-transformer for selecting features based on importance weights. [8]
In 26 demonstrates feature selection using the Random Forest Classifier with out-of-bag (OOB) score enabled.
Firstly, the necessary libraries are imported, including RandomForestClassifier from Scikit-learn's ensemble module, datasets to load the Iris dataset, and SelectFromModel from Scikit-learn's feature_selection module.
The Iris dataset is then loaded, containing features and target variables.
A Random Forest Classifier object randomforest is created with specified parameters such as random_state=0 for reproducibility, n_jobs=-1 for utilizing all available CPU cores during training, and oob_score=True to enable the calculation of the out-of-bag (OOB) score.
Next, an object selector is created using SelectFromModel, which selects features with importance greater than or equal to a specified threshold. In this example, a threshold of 0.3 is chosen.
A new feature matrix features_important is then created using the fit_transform() method of selector, which selects and transforms the original features based on the specified threshold.
Finally, the Random Forest Classifier model model is trained using the most important features identified by selector.
In 27 we can see that oob_score from 0.9533 up to 0.9666.
We must note two caveats to this approach. First, nominal categorical features that have been one-hot encoded will see the feature importance diluted across the binary features. Second, the feature importance of highly correlated features will be effectively assigned to one feature and not evenly distributed across both features.
14.9 Handling Imbalanced Classes
This section will learn when you have a target vector with highly imbalanced classes and want to train a random forest model.
In 32 demonstrates how to handle class imbalance in a classification problem using the Random Forest Classifier.
Firstly, the necessary libraries are imported, including numpy for numerical computations and RandomForestClassifier from Scikit-learn's ensemble module.
The Iris dataset is then loaded, containing features and target variables.
To simulate class imbalance, the first 40 observations are removed from both features and target arrays, effectively reducing the number of samples for the minority class (class 0).
Next, a target vector is created indicating if a sample belongs to class 0 or class 1. Samples belonging to class 0 are labeled as 0, while samples belonging to any other class are labeled as 1.
A Random Forest Classifier object randomforest is created with specified parameters such as random_state=0 for reproducibility, n_jobs=-1 for utilizing all available CPU cores during training, and class_weight="balanced" to automatically adjust class weights inversely proportional to class frequencies in the input data.
Finally, the model is trained using the fit() method with the modified features and target data. and the oobscore is up to 1.0.
Imbalanced classes are a common problem when we are doing machine learning in the real world. We can set RandomForestClassifier to correct for imbalanced classes using the class_weight parameter. However, often a more useful argument is balanced,
14.10 Controlling Tree Size
This section will manually determine the structure and size of a decision tree.
In 45 demonstrates how to create and train a decision tree classifier with specific hyperparameters using Scikit-learn's DecisionTreeClassifier.
Firstly, the necessary libraries are imported, including DecisionTreeClassifier from Scikit-learn's tree module and datasets to load the Iris dataset.
The Iris dataset is then loaded, consisting of features and target variables.
A decision tree classifier object decisiontree is created with specific hyperparameters set:
random_state=0: Ensures reproducibility of results.
max_depth=None: Indicates that the maximum depth of the tree is not limited.
min_samples_split=2: Specifies the minimum number of samples required to split an internal node.
min_samples_leaf=1: Specifies the minimum number of samples required to be at a leaf node.
min_weight_fraction_leaf=0: Specifies the minimum weighted fraction of the sum total of weights required to be at a leaf node.
max_leaf_nodes=None: Indicates that the maximum number of leaf nodes is not limited.
min_impurity_decrease=0: Specifies the minimum impurity decrease required for a split to happen.
Finally, the model is trained using the fit() method with the features and target data.
While it is useful to know these parameters exist, most likely we will only be using max_depth and min_impurity_split because shallower trees (sometimes called stumps) are simpler models and thus have lower variance.
14.11 Improving Performance Through Boosting
This section will learn when you need a model with better performance than decision trees or random forests.
class sklearn.ensemble.AdaBoostClassifier(estimator=None, *, n_estimators=50, learning_rate=1.0, algorithm='SAMME.R', random_state=None), An AdaBoost classifier. [9]
AdaBoost, short for Adaptive Boosting, is a statistical classification meta-algorithm formulated by Yoav Freund and Robert Schapire in 1995, who won the 2003 Gödel Prize for their work. It can be used in conjunction with many other types of learning algorithms to improve performance. The output of the other learning algorithms ('weak learners') is combined into a weighted sum that represents the final output of the boosted classifier. Usually, AdaBoost is presented for binary classification, although it can be generalized to multiple classes or bounded intervals on the real line. [10]
In 51 demonstrates how to create and train an AdaBoost classifier using Scikit-learn's AdaBoostClassifier.
Firstly, the necessary libraries are imported, including AdaBoostClassifier from Scikit-learn's ensemble module and datasets to load the Iris dataset.
The Iris dataset is then loaded, consisting of features and target variables.
An AdaBoost classifier object adaboost is created with a specific parameter random_state=0, which ensures reproducibility of results.
Finally, the model is trained using the fit() method with the features and target data.
In a random forest, an ensemble (group) of randomized decision trees predicts the target vector. An alternative, and often more powerful, approach is called boosting. In one form of boosting called AdaBoost, we iteratively train a series of weak models (most often a shallow decision tree, sometimes called a stump), each iteration giving higher priority to observations the previous model predicted incorrectly.
In scikit-learn, we can implement AdaBoost using AdaBoostClassifier or AdaBoostRegressor. The most important parameters are base_estimator, n_estimators, learning_rate, and loss:
base_estimator: base_estimator is the learning algorithm to use to train the weak models. The most common learner to use with AdaBoost is a decision tree, the parameter’s default argument.
n_estimators:n_estimators is the number of models to iteratively train.
learning_rate: learning_rate is the contribution of each model to the weights, and it defaults to 1. Reducing the learning rate will mean the weights will be increased or decreased to a small degree, forcing the model to train slower (but sometimes resulting in better performance scores).
loss: loss is exclusive to AdaBoostRegressor and sets the loss function to use when updating weights. This defaults to a linear loss function but can be changed to square or exponential.
14.12 Training an XGBoost Model
This section will learn when you need to train a tree-based model with high predictive power.
In 58 demonstrates how to train an XGBoost classifier and generate a classification report to evaluate its performance.
First, the necessary libraries are imported, including xgboost for XGBoost, datasets to load the Iris dataset, preprocessing for data preprocessing, and classification_report for generating a classification report.
The Iris dataset is loaded, consisting of features and target variables.
Next, a dataset xgb_train is created using the xgb.DMatrix function from XGBoost, which is a data structure optimized for XGBoost training.
Parameters for the XGBoost model are defined in the param dictionary, including the objective function multi:softprob for multiclass classification and the number of classes num_class set to 3.
The model is trained using the xgb.train function with the specified parameters.
Predictions are obtained using the trained model by applying the predict function on the xgb_train dataset, and then selecting the class with the highest probability using argmax.
Finally, a classification report is generated using the classification_report function, which provides precision, recall, F1-score, and support for each class. we can see that the accuracy is 0.99.
XGBoost (which stands for Extreme Gradient Boosting) is a very popular gradient boosting algorithm in the machine learning space.
14.13 Improving Real-Time Performance with LightGBM
This section will train a gradient boosted tree-based model that is computationally optimized.
In 63 demonstrates how to train a LightGBM classifier and generate a classification report to evaluate its performance.
First, the necessary libraries are imported, including lightgbm for LightGBM, datasets to load the Iris dataset, preprocessing for data preprocessing, and classification_report for generating a classification report.
The Iris dataset is loaded, consisting of features and target variables.
Next, a dataset lgb_train is created using the lgb.Dataset function from LightGBM.
Parameters for the LightGBM model are defined in the params dictionary, including the objective function multiclass for multiclass classification and the number of classes num_class set to 3. The verbose parameter is set to -1 to suppress model training logs.
The model is trained using the lgb.train function with the specified parameters.
Predictions are obtained using the trained model by applying the predict function on the input features, and then selecting the class with the highest probability using argmax.
Finally, a classification report is generated using the classification_report function, which provides precision, recall, F1-score, and support for each class. we can see that the accuracy is 1.00.
The lightgbm library is used for gradient boosted machines and is highly optimized for training time, inference, and GPU support. As a result of its computational efficiency, it’s often used in production and in large scale settings.
In 9 utilizes the DecisionTreeClassifier from sklearn to classify the digits dataset.
First, it imports the digits dataset from sklearn.datasets, which contains images of handwritten digits along with their corresponding labels.
Then, it splits the dataset into training and testing sets using the train_test_split function from sklearn.model_selection. The testing set comprises 20% of the total dataset, with a fixed random state of 42 to ensure reproducibility.
Next, a DecisionTreeClassifier object is instantiated with random_state=0 to ensure consistent results across different runs.
The classifier is then trained using the training set.
Subsequently, predictions are made on the testing set using the trained model, and the predicted labels are stored in y_pred.
Finally, the accuracy of the model on the testing set is calculated using the accuracy_score function from sklearn.metrics, and the result is printed out.
We can see that the accuracy is 0.8472222.
In 10 we change the criterion to entropy, and we get the better result, the accuracy is 0.8833.
we can see the decision tree model in here.
In 29 we change to the RandomForestClassifier, and the accuracy is 0.9777 !
In 60 we change alot of parameters, but still not get a better accuracy.
In 72 we identifying important features , according to this output we choose the threshold = 0.005, to get a better result, we can see that the accuracy is up to 0.9805.
In 103 first we change to AdaBoostClassifier, but get a terrible result, so we follow the hint, change the algorithm to SAMME, next we change the parameters(especially estimators and learning rate), last we get the accuracy is 0.9.
In 143 we use XGBoost and set the n_estimators=1000, learning_rate=0.1, and we get the accuracy is 0.9722 , In 147 is the default setting.
In 163 we change the model to LightGBM, first we get the 0.975 accuracy, and In 164 we set the num_boost_round = 2000 , and get 0.983 accuracy!
[1] Chapter 14. Trees and Forests , Machine Learning with Python - Theory and Implementation
[2] sklearn.tree.DecisionTreeClassifier , Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
[3] sklearn.tree.DecisionTreeRegressor , Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
[4] sklearn.tree.export_graphviz , Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
[5] sklearn.ensemble.RandomForestClassifier , Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
[6] sklearn.ensemble.RandomForestRegressor , Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
[7] Feature importances with a forest of trees , Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
[8] sklearn.feature_selection.SelectFromModel , Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
[9] sklearn.ensemble.AdaBoostClassifier , Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
[10] AdaBoost , wikipedia