Predictive analytics is the use of data, statistical algorithms, and machine learning techniques to identify the likelihood of future outcomes based on historical data. The goal is to go beyond knowing what has happened to provide the best assessment of what will happen in the future. In this project, we use Decision Tree Algorithm and Naive Bayes Algorithm using Rapidminer and Python.
For predictive analytics, we used the StudentEvent_Resample dataset which we need to resampling our StudentEvent dataset.
Since StudentEvent dataset only has 35 rows, it not enough for us to test the data on the machine learning model. Therefore, we resample StudentEvent dataset as StudentEvent_Resample with a total of rows are 100 rows using Bootstrap Sampling in Rapidminer.
Dataset before resampling
Dataset after resampling
The Decision Tree algorithm belongs to the family of supervised learning algorithms. Unlike other supervised learning algorithms, the decision tree algorithm can be used for solving regression and classification problems too. The goal of using a Decision Tree is to create a training model that can use to predict the class or value of the target variable by learning simple decision rules inferred from prior data (training data). In Decision Trees, for predicting a class label for a record it starts from the root of the trees by comparing the values of the root attribute with the record’s attribute.
For this project, we create a Decision Tree algorithm using Rapidminer and Python.
The Naive Bayes Algorithm is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.
For this project, we create and build the model using Rapidminer and Python.