Decision Tree (Rapidminer)

Decision Tree In Rapidminer

Introduction To Dataset

In this step, we are using the StudentEvent_Resample dataset from the local repository. This dataset contains 100 rows with 11 columns.

Step 1: Choose The Dataset

Create an empty process and drag the StudentEvent_Resample dataset into the blank process. It will create a Retrieve operator.

Step 2: Select Attribute

Choose Select Attribute operator to select and determine the attribute to be analyzed. Connect Retrieve operator with the Select Attribute operator and select the attributes. In this analysis, we do not analyze StudentID, Marks, and MarksBin attributes.

Setting Parameter

Choose subset and invert selection option.

Select Attributes

Select the unwanted attributes to the right side.

Step 3: Set Role

Choose Select Operator and connect it with Select Attribute operator. Use the Select Role operator to set Grade as the label.

Setting Parameter

Set Grade as label.

Step 4: Split Data

Choose Split Data operator and connect Set Role operator with it. This operator produces the desired number of subsets of the given ExampleSet. The ExampleSet is partitioned into subsets according to the specified relative sizes.

Setting the sample type to automatic.

Set 30:70 ratio

Step 5: Decision Tree

Choose Decision Tree Operator. Connect Split Data output into this operator. This Operator generates a decision tree model, which can be used for classification and regression. Setting the parameters as below:

Step 6: Apply Model

Choose Apply Model operator. Connect the output from Split Data and Decision Tree as input for Apply Model operator.

Step 7: Performance

Choose Performance operator and connect Apply Model to the Performance operator. Connect Performance output to the result point.

Step 8: Run Decision Tree Model

Step 9: Result & Visualization

Test Ratio-30:70

Below is the accuray for our model which is 80.65% and its predict that all students will pass this online course.

This is the visualization of Decision Tree which the root is Forum which it predict all students will pass in this online courses.

Test Ratio-50:50

For our model using the ratio 50:50, the acciracy is 84% higher from the ratio 30:70. But the precision percentage (100%) only for predicting Grade B, F and C.

Below is generated Decision Tree for the ratio 50:50 and the root is also Forum. From the tree, we can see that it predicts that some students will failed in this online courses.

Hyperparameter Tuning Decision Tree in Rapidminer

Using the same Decision Tree model and the same dataset, we just add in Optimize Parameter (Grid) operator to optimize our Decision Tree parameters to increase our prediction accuracy and increase the performance values .

Optimize Parameter (Grid)

The Optimize Parameters (Grid) operator is a nested operator. It executes the subprocess for all combinations of selected values of the parameters and then delivers the optimal parameter values through the parameter set port.

This is the view for our main process. Since Optimizer Parameter is nested operator, all the operator that related to Decision Tree model is placed inside subprocess of this operator.

This is view for subprocess inside Optimizer Parameter.

Setting Optimize Parameter

This is the most important part of optimizing the Decision Tree parameters.

For this optimizer, we choose log performance, log all criteria and enable parallel execution

This is where we set which parameters will be tuning by Optimizer Parameter. For the Decision Tree model, there are 8 more parameters that can be optimized in order to increase the model performance. But for this project, we choose only two parameters which are DecesionTree.criterion and DecisionTree.maximal_depth.

DecisionTree.criterion

Selects the criterion on which Attributes will be selected for splitting. For each of these criteria, the split value is optimized with regard to the chosen criterion. It can have one of the following values:

information_gain: The entropies of all the Attributes are calculated and the one with the least entropy is selected for a split. This method has a bias towards selecting Attributes with a large number of values.
gain_ratio: A variant of information gain that adjusts the information gain for each Attribute to allow the breadth and uniformity of the Attribute values.
gini_index: A measure of inequality between the distributions of label characteristics. Splitting on a chosen Attribute results in a reduction in the average gini index of the resulting subsets.
accuracy: An Attribute is selected for splitting, which maximizes the accuracy of the whole tree.
least_square: An Attribute is selected for splitting, which minimizes the squared distance between the average of values in the node with regards to the true value.

2. DecisionTree.maximal_depth

The depth of a tree varies depending upon the size and characteristics of the ExampleSet. This parameter is used to restrict the depth of the decision tree. If its value is set to '-1', the maximal depth parameter puts no bound on the depth of the tree. In this case the tree is built until other stopping criteria are met. If its value is set to '1', a tree with a single node is generated.

Result & Visualization

The Ratio of 30:70

From the result below, the accuracy of our model after tuning is 100%. Its means all the prediction from this model is correct!

Below is Tree generated for our model and the root node is Assignment.

Test Ratio Is 50:50

Below is the result for our model using the ratio of 50:50 and the accuracy increases to 96%. All the class precision percentage is 100% except for predicting Grade B and B+.

From the tree below. we can see that Quiz has higher probability and followed by Assignment.

Summary

For Decision Tree in Rapidminer, we can see that the Decision Tree best model is using the ratio of 30:70.

Next Topic: Decision Tree In Python

Page updated

Report abuse

Decision Tree (Rapidminer)

Decision Tree In Rapidminer

Introduction To Dataset

Step 1: Choose The Dataset

Step 2: Select Attribute

Setting Parameter

Select Attributes

Step 3: Set Role

Setting Parameter

Step 4: Split Data

Step 5: Decision Tree

Step 6: Apply Model

Step 7: Performance

Step 8: Run Decision Tree Model

Step 9: Result & Visualization

Test Ratio-30:70

Test Ratio-50:50

Hyperparameter Tuning Decision Tree in Rapidminer

Optimize Parameter (Grid)

Setting Optimize Parameter

Result & Visualization

The Ratio of 30:70

Test Ratio Is 50:50

Summary

Next Topic: Decision Tree In Python

Copyright by 199607-Build using sites.google.com