for those features determine where if you split the points there will be more information gain. (Space is split in more equal forms)
Pick very few features randomly from e.g. 100,000 features that you might have.
is built in greedy fashion to avoid combinatoric explosion of number of possibilities
which attribute to pick first?
A good attribute splits the data into subsets that are (ideally) all positive or all negative.
Then if feature value is greater/smaller than that the prob. is the populations of classes vs each other. Keep splitting as you go down the tree based on nodes.
If there are many many trees it is a good classifier.
It can run distributed.
[the initial set of nodes can be a subset of the total training set we have]
It is a supervised classifier.