Naive Bayes

Experiment 1 : Decision Tree using RapidMiner using clean dataset (resample 100 rows)

The complete RapidMiner process for implementing Naive Bayes model shown in this figure.

RapidMiner Process

Retrive StudentEvent Resample dataset and drag into the Process Panel .
Select the "Select Attributes" operator and drag next to studentEvent dataset . Connect the Exa port to out port .
In the "parameter" panel , click on "Attributes Filter Type" dropdown and pick "Subset" so we can select any attributes in the dataset.
In the parameter panel , we need to click on "Attributes">"Select Attributes " then pick StudentID , marks and marksbin to as we dont need the attributes.
Next to tick on "Invert Selection" to indicate we want to exclude the StudentID , marks and marksbin from the actual calculation .
Select the "Set Role" operator and drag it to the right of select attributes operator . Connect both operator via Exa ports.
In the "Parameter" panel , click on the "Attributes Name" dropdown and pick Grade.
Click the "Target Role" dropdown menu and pick Label . This will make the Grade variable act as a target attributes .
Select "Split Data" operator and drag it next to Set Role operator .Connect them via Exa ports .
In the "Parameters" panel , click on the "Partitions">"Edit Enumeration" and click on "Add Entry" and insert 0.8 and 0.2 as the training and testing ratio .
Select "Naive Bayes" operator and drag next to split data and connect via Tra port to Par port .
In the "Parameters" panel , click on the laplace correction as it highly recomended as a dataset may not have all combinations of attribute values of each class values.
Lastly , select the Performance (Performance(Classification)) parameter and drag it next to Naive Bayes operator Connect Apply Model lab port with Performance lab port.
In the "Parameter" panel , tick the checkbox "Accuracy" and "Classification Error".
Connect Performance per port to res and Exa port to Res .
Click the run button or press F11 to execute the process .
For the results we can look the Performance Vector to find the accuracy and classification error percentage .

Results for 80:20 Training and Test Sets

The modeling block builds the naive bayes using the training dataset. The Apply model block predicts the class label of the test dataset using the developed model and appends the predicted label to the dataset. The predicted dataset is one of the three outputs of the process and is shown in this figure. Note the prediction test dataset has both the predicted and the original class label. The model has predicted correct class only 5 of the records. The 16 incorrect predictions are highlighted in this figure.

This figure shows the accuracy results of the model and the confusion matrix. It is evident that the model able to get 5 of the 21 class predictions correct and 16 of the 21 (in boxes) wrong for prediction A , predictions B+, and prediction A-, which translates to about only 23.81% accuracy and 76.19% classification error.

Results for 70:30 Training and Test Sets

The modeling block builds the naive bayes using the training dataset. The Apply model block predicts the class label of the test dataset using the developed model and appends the predicted label to the dataset. The predicted dataset is one of the three outputs of the process and is shown in this figure. Note the prediction test dataset has both the predicted and the original class label. The model has predicted correct class only 9 of the records. The 21 incorrect predictions are highlighted in this figure.

This figure shows the accuracy results of the model and the confusion matrix. It is evident that the model able to get 9 of the 30 class predictions correct and 21 of the 30 (in boxes) wrong for prediction A , predictions B+, and prediction A-, which translates to about only 30.00% accuracy and 70.00% classification error.

Results for 60:40 Training and Test Sets

The modeling block builds the naive bayes using the training dataset. The Apply model block predicts the class label of the test dataset using the developed model and appends the predicted label to the dataset. The predicted dataset is one of the three outputs of the process and is shown in this figure. Note the prediction test dataset has both the predicted and the original class label. The model has predicted correct class only 14 of the records. The 26 incorrect predictions are highlighted in this figure.

This figure shows the accuracy results of the model and the confusion matrix. It is evident that the model able to get 14 of the 40 class predictions correct and 26 of the 40 (in boxes) wrong for prediction A , predictions B+, and prediction A-, which translates to about only 35.00% accuracy and 65.00% classification error.

Page updated

Google Sites

Report abuse