Random Forest

Demonstration Video

Dataset used Random Forest model above is the dataset that have been preprocessing. Strength label as the Set Role as we want to predict the Strength position whether it is Offensive or Defensive. Features selection from the attributes are selected based on the overall performance attributes in PowerBI dashboard which are BLK (Block), STL(Steal), REB(Rebound), MADE and FG(TOT ATT). For Discretize operator, the bins set using different number of bins for example 2 and 3. The figure below shows the comparison result using different bins values.

For the Split Data, we were train using 0.7 while test using 0.3. Inside the Random Forest parameter we set the accuracy as the criterion and maximal depth value equal to 10. To visualize the forecast result, Model Simulator operator has been selected. Correlation Matrix is to show pair of strong attribute, therefore we use the value of bins of 2 initially because if the number of bins is more than that we cannot see the result of this correlation matrix.

random for cor mat.pptx

Correlation Matrix is an operator that been used to show pair of strong attribute. The line on 1s going from the top of left to bottom right is diagonal, indicating that each variable is always perfectly correlated with itself. For example REB attribute is perfectly correlated (1) with REB attribute too. The result are shows in few visualization such as in Data, Pairwise Table, and Matrix Visualization.

From this tree we can observe that the root is BLK followed by the nodes and most of the leaf produced are Offensive Strength. The Defensive Strength is only occurred when the node, FG(TOT ATT at range [-∞ -98.500].

The prediction of Decision Tree result shown that the accuracy is 100% with confidence for the decision is 89.85%. This is because the biggest support is coming from the BLK attribute considering when we change the BLK range from [-∞-5.500] to [5.500- -∞] and FG(TOT ATT) range from [-∞-98.500] to [98.500- -∞]without changing other attribute, the result produced are 100% Defensive. However when we change all the attributes, it will become Offensive. This is because BLK and FG(TOT ATT) is a big value of attributes.

Random Forest by Cross Validation

Cross validation model are used to test the ability of a model to predict new result by giving better approximation for accuracy with less overfitting the data. The selected attributes and set role are the same as in the previous Random Forest model which are BLK (Block), STL(Steal), REB(Rebound), MADE and FG(TOT ATT) and Strength label for Set Role.

From the Performance Vector (Performance) results shows that the model accuracy is 85.00% +/-24.15% (micro average: 83.33%)

Shown that predicted 3 Defensive and the result is 3 Defensive. With 1 predicted Offensive but it is actually Defensive as well as 2 predicted Defensive but actually it is Offensive. Last but not least, stated that predicted 12 as Offensive and the outcome result is correct with 12 Offensive.

Optimize Parameter (Grid)

Optimize Parameter (Grid) are used to execute and delivers the optimal result based on the attributes selected.

This model is built same as Decision Tree model except the machine learning used is replaced with Random Forest. Optimize Parameter (Grid) are used to execute and delivers the optimal result based on the attributes selected. The selected attributes and set role are the same as in the previous Random Forest model which are BLK (Block), STL(Steal), REB(Rebound), MADE and FG(TOT ATT and Strength label for Set Role. For the Discretize by Binning operator can be tested inside or outside of Optimize Parameter considering to the same result produced. Ratio used for Split Data is 0.7:0.3, however we managed to do the comparison using other ratio which is 0.5:0.5 and 0.3:0.7. The comparison figure are shown below the result.

In the Optimize Parameter setting, we use criterion, confidence, minimal_gain, maximal_depth, and minimal_size_for_split as the selected parameter. We only change for the criterion setting, the grid range is set as accuracy, gain_ratio, information_gain, and gini index while the other parameter are as predefined.

From the Performance Vector (Performance) results shows that the model accuracy is 100.00%

Shown that predicted 1 Defensive and the result is 1 Defensive. With 0 predicted Offensive but it is actually 0 Defensive same as 0 predicted Defensive but actually it is 0 Offensive. Last but not least, stated that predicted 4 as Offensive and the outcome result is correct with 4 Offensive.

The comparison using various ratio of Split Data shown that the accuracy without using hyperparameter tuning is decreasing when the training value used is low while high for testing. However, by using ratio 0.5:0.5 the result lower than ratio 0.3:0.7.

On the other hand, the accuracy using hyperparameter tuning is remains the same but the differences can be traced on the parameter set confusion matrix.

Confusion matrix by various Split Data

Page updated

Google Sites

Report abuse