Naive Bayes In Rapidminer

Introduction To Dataset

In this step, we are using the StudentEvent_Resample dataset from the local repository. This dataset contains 100 rows with 11 columns.

Step 1: Choose The Dataset

Create an empty process and drag the StudentEvent_Resample dataset into the blank process. It will create a Retrieve operator.

Step 2: Select Attribute

Choose Select Attribute operator to select and determine the attribute to be analyzed. Connect Retrieve operator with the Select Attribute operator and select the attributes. In this analysis, we do not analyze StudentID, Marks, and MarksBin attributes.

Setting Parameter

Choose subset and invert selection option.

Select Attributes

Select the unwanted attributes to the right side.

Step 3: Set Role

Choose Select Operator and connect it with Select Attribute operator. Use the Select Role operator to set Grade as the label.

Setting Parameter

Set Grade as label.

Step 4: Split Data

Choose Split Data operator and connect Set Role operator with it. This operator produces the desired number of subsets of the given ExampleSet. The ExampleSet is partitioned into subsets according to the specified relative sizes.

Setting the sample type to automatic.

Set 30:70 ratio

Step 5: Naive Bayes

Choose Naive Bayes Operator. Connect Split Data output into this operator. This Operator generates a Naive Bayes classification model. Setting the parameters as below:

Step 6: Apply Model

Choose Apply Model operator. Connect the output from Split Data and Naive Bayes operator as input for Apply Model operator.

Step 7: Performance

Choose Performance operator and connect Apply Model to the Performance operator. Connect Performance output to the result point.

Step 8: Run Naive Bayes Model

Step 9: Result & Visualization

Test Ratio-30:70

Distribution Table

Test Ratio-50:50

Hyperparameter Tuning Naive Bayes in Rapidminer

Using the same Naive Bayes model and the same dataset, we just add in Optimize Parameter (Grid) operator to optimize our Naive Bayes parameters to increase our prediction accuracy and increase the performance values .

Optimize Parameter (Grid)

The Optimize Parameters (Grid) operator is a nested operator. It executes the subprocess for all combinations of selected values of the parameters and then delivers the optimal parameter values through the parameter set port.

This is the view for our main process. Since Optimizer Parameter is nested operator, all the operator that related to Decision Tree model is placed inside subprocess of this operator.

This is view for subprocess inside Optimizer Parameter.

Setting Optimize Parameter

This is the most important part of optimizing the Naive Bayes parameters.

We choose to check log performance, log all criteria and enable parallel execution. For error gandling, choose fail on error.

This is where we set which parameters will be tuning by Optimizer Parameter. For the Naive Bayes model, only 1 parameter can be optimized in order to increase the model performance which is Laplace correction.

laplace_correction

The simplicity of Naive Bayes includes a weakness: if within the training data a given Attribute value never occurs in the context of a given class, then the conditional probability is set to zero. When this zero value is multiplied together with other probabilities, those values are also set to zero, and the results will be misleading. Laplace correction is a simple trick to avoid this problem, adding one to each count to avoid the occurrence of zero values. For most training sets, adding one to each count has only a negligible effect on the estimated probabilities.

Result & Visualization

Test Ratio is 30:70

Based on the result below, we can see that the accuracy for Naive Bayes increases to 80.65%. The class precision percentage is 100% only for predict Grade A-, B- and C.

Test Ratio is 50:50

For Naive Bayes with the ratio of 50:50, the accuracy also increased to 70%. But the increases is still below 80%. Also the percentage class precision 100 only in predict Grade B-, F and C.

This is SimpeDistrubation that sumamry the distribution for our target which is Grade. All grades has 7 distributions but with the different values.

Summary

For Naive Bayes in Rapodminer, we can see that our model increases after tuning. But for the model using the ratio 50 and 50, the accuracy still below 80%. Its means that our model percentage error is 30%. For NaiveBayes in Rapidminer, the best performance is 80.65% using 30:70 ratio.

Next Topic: Naive Bayes in Python

Page updated

Report abuse

Naive Bayes In Rapidminer

Naive Bayes In Rapidminer

Introduction To Dataset

Step 1: Choose The Dataset

Step 2: Select Attribute

Setting Parameter

Select Attributes

Step 3: Set Role

Setting Parameter

Step 4: Split Data

Step 5: Naive Bayes

Step 6: Apply Model

Step 7: Performance

Step 8: Run Naive Bayes Model

Step 9: Result & Visualization

Test Ratio-30:70

Test Ratio-50:50

Hyperparameter Tuning Naive Bayes in Rapidminer

Optimize Parameter (Grid)

Setting Optimize Parameter

Result & Visualization

Test Ratio is 30:70

Test Ratio is 50:50

Summary

Next Topic: Naive Bayes in Python

Copyright by 199607-Build using sites.google.com