MTU Classification - Decision Tree and Random Forest

Authors

Chetan Badgujar

Kishen Saravanan

Mitul Shah

Rajasekar Kamaraj

Decision Tree

Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It works for both categorical and continuous input and output variables.

Types of Decision Trees

Types of decision tree is based on the type of target variable we have. It can be of two types:

Categorical Variable Decision Tree: Decision Tree which has categorical target variable then it called as categorical variable decision tree. Example: - In above scenario of student problem, where the target variable was “Student will play cricket or not” i.e. YES or NO.
Continuous Variable Decision Tree: Decision Tree has continuous target variable then it is called as Continuous Variable Decision Tree.

Example: - Let’s say we have a problem to predict whether a customer will pay his renewal premium with an insurance company (yes/ no). Here we know that income of customer is a significant variable but insurance company does not have income details for all customers. Now, as we know this is an important variable, then we can build a decision tree to predict customer income based on occupation, product, and various other variables. In this case, we are predicting values for continuous variable.

Terminology related to Decision Trees

Root Node: It represents entire population or sample and this further gets divided into two or more homogeneous sets.
Splitting: It is a process of dividing a node into two or more sub-nodes.
Decision Node: When a sub-node splits into further sub-nodes, then it is called decision node.
Leaf/ Terminal Node: Nodes do not split is called Leaf or Terminal node.
Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You can say opposite process of splitting.
Branch / Sub-Tree: A sub section of entire tree is called branch or sub-tree.
Parent and Child Node: A node, which is divided into sub-nodes is called parent node of sub-nodes whereas sub-nodes are the child of parent node.

How does Decision Tree work in JAMOVI?

Step 1: Install JAMOVI application software and set the working directory to active JAMOVI project.

Step 2: Open JAMOVI on your laptop/PC and go to "File" tab and open the data set which you want to for Decision tree classification model.

Step 3: Once the data is loaded, Click on "MTUClassification" icon and choose "Classification using Decision Tree" option for implementing Decision Tree.

Step 4: The variables will be loaded as shown in the figure above. Choose the needed ratio of test and training data set, method, and Minbucket parameters.

Since we have not loaded any variables, JAMOVI throws an error for unavailable parameters.

Step 5: Choose the dependent and independent variables as shown in the figure. The right pane displays the output.

Random Forest

Random Forest is a versatile machine learning method capable of performing both regression and classification tasks. It also undertakes dimensional reduction methods, treats missing values, outlier values and other essential steps of data exploration, and does a fairly good job. It is a type of ensemble learning method, where a group of weak models combine to form a powerful model.

To classify a new object based on attributes, each tree gives a classification and we say the tree “votes” for that class. The forest chooses the classification having the most votes (over all the trees in the forest) and in case of regression, it takes the average of outputs by different trees.

How does Random Forest work in JAMOVI?

Step 1: Install JAMOVI application software and set the working directory to active JAMOVI project.

Step 2: Open JAMOVI on your laptop/PC and go to "File" tab and open the data set which you want to for Random Forest classification model.

Step 3: Once the data is loaded, Click on "MTUClassification" icon and choose "Classification using Random Forest" option for implementing Random Forest.

Step 4: The variables will be loaded as shown in the figure above. Choose the needed ratio of test and training data set, method, and Minbucket parameters.

Since we have not loaded any variables, JAMOVI throws an error for unavailable parameters.

Step 5: Choose the dependent and independent variables as shown in the figure. The right pane displays the results.

Step 6: Try different values for split for test/ train ratio and number of tree.

Hands-on demo of JAMOVI implementing classification - Decision tree & Random forest

Report abuse