what tree doing?
Classification
Predictions
Feature Selection
Idea behind Trees
Recursive partitioning (Growing Tree)
Pruning (Cut Tree )
dichotomy
Terminology
Decision Node : node made split
Terminal Node : the node made conclusion
We use iris dataset here.
And first we only consider sepal features. We apply the decision tree in max depth = 2. And we got following result . Then we reflect the result to the scatter plot.
So, basically we can see this algorithms try to split data set into several parts. And they will assign the most majority class into the data belongs to that kind.
Idea is same here. But we get the result of predictions by using the group mean.
Steps :
Information Gain Criteria :
Entropy : How different everything between groups
near 1 high entropy - high different between groups
near 0 low entropy - similar obs between groups
Gini : How similar data with groups
near 0 within class similar
near 1 within class different
Why need cutting tree
Avoid Overfitting
Steps:
How to choose best cut tree
Cost Complexity (CCT)
CCT = R + α L
R : misclassification rate
L : number of leaves
α : complexity parameter
This part is similar in Logistic Regression. We are not going focus on this part.
Accuracy Rate
Confusion Matrix, sensitivity and specificity
ROC and AUC.
Classification / Prediction
Feature importances
relative importances in building tress.
help us choose predictors in some situations
Can do variable selection by feature importance.
can handle missing data.
easy to understand.
when decision boundary is linear, it performs bad.
Require Large dataset.