Naive Bayes

Experiment 2: Decision Tree using RapidMiner using discritize data (resample 100 rows)

The complete RapidMiner process for implementing Naive Bayes model shown in this figure.

Retrive StudentEvent Dricretize Resample dataset and drag into the Process Panel .
Select the "Select Attributes" operator and drag next to studentEvent dataset . Connect the Exa port to out port .
In the "parameter" panel , click on "Attributes Filter Type" dropdown and pick "Subset" so we can select any attributes in the dataset.
In the parameter panel , we need to click on "Attributes">"Select Attributes " then pick StudentID , marks and marksbin to as we dont need the attributes.
Next to tick on "Invert Selection" to indicate we want to exclude the StudentID , marks and marksbin from the actual calculation .
Select the "Set Role" operator and drag it to the right of select attributes operator . Connect both operator via Exa ports.
In the "Parameter" panel , click on the "Attributes Name" dropdown and pick Grade.
Click the "Target Role" dropdown menu and pick Label . This will make the Grade variable act as a target attributes .
Select "Split Data" operator and drag it next to Set Role operator .Connect them via Exa ports .
In the "Parameters" panel , click on the "Partitions">"Edit Enumeration" and click on "Add Entry" and insert 0.8 and 0.2 as the training and testing ratio .
Select "Naive Bayes" operator and drag next to split data and connect via Tra port to Par port .
In the "Parameters" panel , click on the laplace correction as it highly recomended as a dataset may not have all combinations of attribute values of each class values.
Lastly , select the Performance (Performance(Classification)) parameter and drag it next to Naive Bayes operator Connect Apply Model lab port with Performance lab port.
In the "Parameter" panel , tick the checkbox "Accuracy" and "Classification Error".
Connect Performance per port to res and Exa port to Res .
Click the run button or press F11 to execute the process .
For the results we can look the Performance Vector to find the accuracy and classification error percentage .

Results for 80:20 Training and Test Sets

The modeling block builds the naive bayes using the training dataset. The Apply model block predicts the class label of the test dataset using the developed model and appends the predicted label to the dataset. The predicted dataset is one of the three outputs of the process and is shown in this figure. Note the prediction test dataset has both the predicted and the original class label. The model has predicted correct class only 21 of the records.

This figure shows the accuracy results of the model and the confusion matrix. It is evident that the model able to get 21 of the 21 class predictions correct, which translates to about 100.00% accuracy and 0.00% classification error.

Results for 70:30 Training and Test Sets

The modeling block builds the naive bayes using the training dataset. The Apply model block predicts the class label of the test dataset using the developed model and appends the predicted label to the dataset. The predicted dataset is one of the three outputs of the process and is shown in this figure. Note the prediction test dataset has both the predicted and the original class label. The model has predicted correct class only 28 of the records. The 2 incorrect predictions are highlighted in this figure.

This figure shows the accuracy results of the model and the confusion matrix. It is evident that the model able to get 28 of the 30 class predictions correct and 2 of the 30 (in boxes) wrong for prediction A , which translates to about only 93.33% accuracy and 6.67% classification error.

Results for 60:40 Training and Test Sets

The modeling block builds the naive bayes using the training dataset. The Apply model block predicts the class label of the test dataset using the developed model and appends the predicted label to the dataset. The predicted dataset is one of the three outputs of the process and is shown in this figure. Note the prediction test dataset has both the predicted and the original class label. The model has predicted correct class only 37 of the records. The 3 incorrect predictions are highlighted in this figure.

This figure shows the accuracy results of the model and the confusion matrix. It is evident that the model able to get 37 of the 40 class predictions correct and 3 of the 40 (in boxes) wrong for prediction A , which translates to about only 92.50% accuracy and 7.50% classification error.