Decision Tree

Experiment 1 : Decision Tree using RapidMiner using clean dataset (resample 100 rows)

The complete RapidMiner process for implementing Decision Tree model shown in this figure.

RapidMiner process

Retrived StudentEvent Resample dataset drag it to the process panel.
Select the "Select Attributes" operator and drag it to the process panel.
In the "parameter" panel , click on "Attributes Filter Type" dropdown and pick "Subset" so we can select any attributes in the dataset.
In the parameter panel , we need to click on "Attributes">"Select Attributes " then pick StudentID , marks and marksbin to as we dont need the attributes.
Next to tick on "Invert Selection" to indicate we want to exclude the StudentID , marks and marksbin from the actual calculation .
Select the "Set Role" operator and drag it to the right of select attributes operator . Connect both operator via Exa ports.
In the "Parameter" panel , click on the "Attributes Name" dropdown and pick Grade.
Click the "Target Role" dropdown menu and pick Label . This will make the Grade variable act as a target attributes .
Select "Split Data" operator and drag it next to Set Role operator .Connect them via Exa ports .
In the "Parameters" panel , click on the "Partitions">"Edit Enumeration" and click on "Add Entry" and insert 0.8 and 0.2 as the training and testing ratio .
Select the "Decision Tree" operator and drag next to "Split Data" operator and connect them via Par to Tra ports.
In the "Parameters" panel, click on the "Criterion" and select "gain_ratio".
Next select Apply Model parameter and drag it to the bottom of the Decision Tree operator . Connect them via Mod and Mod ports and Split Data Par port to Apply Model Uni port.
Lastly , select the Performance (Performance(Classification)) parameter and drag it next to Decision Tree operator Connect Apply Model lab port with Performance lab port.
In the "Parameter" panel , tick the checkbox "Accuracy" and "Classification Error".
Connect Performance per port to res and exa port to res .
Click the run button or press F11 to execute the process .
For the results we can look the Performance Vector to find the accuracy and classification error percentage .

Results for 80:20 Training and Test Sets

The modeling block builds the decision tree using the training dataset. The Apply model block predicts the class label of the test dataset using the developed model and appends the predicted label to the dataset. The predicted dataset is one of the three outputs of the process and is shown in this figure. Note the prediction test dataset has both the predicted and the original class label. The model has predicted correct class for 19 of the records. The 2 incorrect predictions are highlighted in this figure.

This figure shows the accuracy results of the model and the confusion matrix. It is evident that the model able to get 19 of the 21 class predictions correct and 2 of the 21 (in boxes) wrong for predictions B+, and prediction A-, which translates to about only 90.48% accuracy and 9.52% classification error.

Results for 70:30 Training and Test Sets

The modeling block builds the decision tree using the training dataset. The Apply model block predicts the class label of the test dataset using the developed model and appends the predicted label to the dataset. The predicted dataset is one of the three outputs of the process and is shown in this figure. Note the prediction test dataset has both the predicted and the original class label. The model has predicted correct class for 25 of the records. The 5 incorrect predictions are highlighted in this figure.

This figure shows the accuracy results of the model and the confusion matrix. It is evident that the model able to get 25 of the 30 class predictions correct and 5 of the 30 (in boxes) wrong for prediction A- and prediction B+, which translates to about only 83.33% accuracy and 16.67% classification error.

Results for 60:40 Training and Test Sets

The modeling block builds the decision tree using the training dataset. The Apply model block predicts the class label of the test dataset using the developed model and appends the predicted label to the dataset. The predicted dataset is one of the three outputs of the process and is shown in this figure. Note the prediction test dataset has both the predicted and the original class label. The model has predicted correct class for 33 of the records. The 7 incorrect predictions are highlighted in this figure.

This figure shows the accuracy results of the model and the confusion matrix. It is evident that the model able to get 33 of the 40 class predictions correct and 7 of the 40 (in boxes) wrong for prediction A- and prediction B+ , which translates to about only 82.50% accuracy and 17.50% classification error.

Page updated

Google Sites

Report abuse