Decision Tree

Expected entropy of a categorical attribute with probability distribution Π :
H(Π)= -Σ(Π log Π)

e.g. for a training set containing p positive and n negative examples, we have:
H(p/(p+n),n/(p+n)) = - p/(p+n) log(p/(p+n)) - n/(p+n) log(n/(p+n))
a.k.a (Gibbs entropy)

How to pick attributes?
attribute  A , with  K distinct values, divides the training set  E into subsets  E 1 , ... ,  E K .

Expected Entropy remaining after trying attribute  A (with branches  i=1,2,..., K ):
EH(A) = Σ^K_{k=1} (p_k+n_k)/(p+n) H(p_k/(p_k+n_k), n_k/(p_k+n_k))
is each entropy multiplied by the proportion of that categorical attribute value

where, p_k+n_k is the # of nodes (positive or negative) in kth child

Information Gain for this attribute is:

Pick the attribute with largest I(A)!

Once the attribute is set move down to each child and repeat the process.

if data is continuoustrat each point as cut points. use projection of the points in x direction and y direction as attribtues

in continuous case pick random x,y values as attributes and whther value is greater or smaller send it to left, right respectively

use highest information gain as before to decide which attribute first.

but if we had 20,000 dimensional vectors we would go for random forrest

if you want to do unsupervised and figureout clusters using randomforrest you can try to fit gaussian for each attribute and each one: fit a gausian (mean, variance) to each side, with highest information is selected

To avoid overfitiing prune
Until pruning is harmful
For each subtree
Remove and replace it with its majority class
evaluate on validation set
Remove subtree that leads to largest accuracy in Validation set

for continuous features, sort, all availblevalues or mean ofconsecutive values