Naive Bayes

Experiment 3 : Decision Tree using RapidMiner using normalize dataset (resample 100 rows)

The complete RapidMiner process for implementing Naive Bayes model shown in this figure.

Retrive StudentEvent Normalize Resample dataset and drag into the Process Panel .
Select the "Select Attributes" operator and drag next to studentEvent dataset . Connect the Exa port to out port .
In the "parameter" panel , click on "Attributes Filter Type" dropdown and pick "Subset" so we can select any attributes in the dataset.
In the parameter panel , we need to click on "Attributes">"Select Attributes " then pick StudentID , marks and marksbin to as we dont need the attributes.
Next to tick on "Invert Selection" to indicate we want to exclude the StudentID , marks and marksbin from the actual calculation .
Select the "Set Role" operator and drag it to the right of select attributes operator . Connect both operator via Exa ports.
In the "Parameter" panel , click on the "Attributes Name" dropdown and pick Grade.
Click the "Target Role" dropdown menu and pick Label . This will make the Grade variable act as a target attributes .
Select "Split Data" operator and drag it next to Set Role operator .Connect them via Exa ports .
In the "Parameters" panel , click on the "Partitions">"Edit Enumeration" and click on "Add Entry" and insert 0.8 and 0.2 as the training and testing ratio .
Select "Naive Bayes" operator and drag next to split data and connect via Tra port to Par port .
In the "Parameters" panel , click on the laplace correction as it highly recomended as a dataset may not have all combinations of attribute values of each class values.
Lastly , select the Performance (Performance(Classification)) parameter and drag it next to Naive Bayes operator Connect Apply Model lab port with Performance lab port.
In the "Parameter" panel , tick the checkbox "Accuracy" and "Classification Error".
Connect Performance per port to res and Exa port to Res .
Click the run button or press F11 to execute the process .
For the results we can look the Performance Vector to find the accuracy and classification error percentage .

Results for 80:20 Training and Test Sets

The modeling block builds the naive bayes using the training dataset. The Apply model block predicts the class label of the test dataset using the developed model and appends the predicted label to the dataset. The predicted dataset is one of the three outputs of the process and is shown in this figure. Note the prediction test dataset has both the predicted and the original class label. The model has predicted correct class only 9 of the records. The 12 incorrect predictions are highlighted in this figure.

This figure shows the accuracy results of the model and the confusion matrix. It is evident that the model able to get 9 of the 21 class predictions correct and 12 of the 21 (in boxes) wrong for predictions B+ , which translates to about only 42.86% accuracy and 57.14% classification error.

Results for 70:30 Training and Test Sets

The modeling block builds the naive bayes using the training dataset. The Apply model block predicts the class label of the test dataset using the developed model and appends the predicted label to the dataset. The predicted dataset is one of the three outputs of the process and is shown in this figure. Note the prediction test dataset has both the predicted and the original class label. The model has predicted correct class only 15 of the records. The 15 incorrect predictions are highlighted in this figure.

This figure shows the accuracy results of the model and the confusion matrix. It is evident that the model able to get 15 of the 30 class predictions correct and 15 of the 30 (in boxes) wrong for predictions B+ , which translates to about only 50.00% accuracy and 50.00% classification error.

Results for 60:40 Training and Test Sets

The modeling block builds the naive bayes using the training dataset. The Apply model block predicts the class label of the test dataset using the developed model and appends the predicted label to the dataset. The predicted dataset is one of the three outputs of the process and is shown in this figure. Note the prediction test dataset has both the predicted and the original class label. The model has predicted correct class only 22 of the records. The 18 incorrect predictions are highlighted in this figure.

This figure shows the accuracy results of the model and the confusion matrix. It is evident that the model able to get 22 of the 40 class predictions correct and 18 of the 40 (in boxes) wrong for prediction A , and predictions B+ , which translates to about only 55.00% accuracy and 45.00% classification error.

Page updated

Google Sites

Report abuse