Random Forest:
· Ensemble of decision trees
· In decision tree, we have only one tree ,based on that we classify objects
· Whereas random forest generate multiple trees and take the decisions based on ensemble of all these trees
· More consistency
· For example, 500 models will be generated and classification can be done based on all these models
· For discrete attributes, class will be taken as majority predicted value (Classification Tree)
· For continuous attributes, class will be taken as average of all the predicted values (Regression Tree)
· Improves predictive accuracy
· Generates large number of boot strapped trees
· Final predicted outcome by combining the results across all of the trees
R Code:
>library(randomForest)
>Iris <- read.csv("E:/Dataset/Iris.csv") #Loading Training Data
>mydata=Iris #Moving Iris data set in to mydata
>mymodel=randomForest(Class ~ sepal.length+sepal.width+petal.length+petal.width,data=mydata) #model preparation
>mymodel
Call:
randomForest(formula = Class ~ sepal.length + sepal.width + petal.length + petal.width, data = mydata)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 2
OOB estimate of error rate: 4.67%
Confusion matrix:
Iris-setosa Iris-versicolor Iris-virginica class.error
Iris-setosa 50 0 0 0.00
Iris-versicolor 0 47 3 0.06
Iris-virginica 0 4 46 0.08
Interpretation:
>newdata=read.csv("E:/Dataset/Iris_test.csv") #Loading Test Data
>view(newdata)
>pred=predict(mymodel,newdata=newdata) #validation
>mytable=table(newdata$Class,pred) #confusion matrix
>mytable
pred
Iris-setosa Iris-versicolor Iris-virginica
Iris-setosa 49 0 0
Iris-versicolor 0 15 0
Iris-virginica 0 0 2
Interpretation: