K-Means Clustering

Demonstration Video

For the K-Means Clustering model, operator such as Select Attributes, Set Role, Discretize were sets as same as for decision tree and random forest tree model. Nominal to Numerical operator will ensure all attributes used are in numerical to make sure there is no error while running the process. Moreover, inside the K-Means Clustering model, we have agreed to use 3 as value for k which will make sure the result will performs 3 amount of clusters. Cluster Model Visualizer will visual the average distance of each cluster built.

Cluster Model Visualizer Object

From the Cluster Model Visualizer Object, It was found that there are 3 cluster which are Cluster 0 with 11 items and average distance 0.298, Cluster 1 with 4 items and average distance 1 and Cluster 2 with 2 items and average distance 0.444.

Based from the heatmap we can made an assumption that most of player in Cluster 0 is a low or bottom player because they have not performs much in stealing ball (STL), blocking (BLK), field goal (FG(TOT ATT), made(MADE) and also rebound(REB). For Cluster 2 the player can be call as medium or average player considering to their performance where they did Made, REB, FG(TOT ATT, and BLK. Top player can be found in Cluster 2 considering to the STL result is the highest with 500%. Referring back to PowerBI dashboard, players managed to make high STL are An Chong 40, Liam 46 and Kamal 26.

Centroid Chart will identify an attributes that is important for a given cluster. From the figure above, we can see an important attributes at the peak of centroid chart.

scat plot cluster.pptx

This will show the scatter plot by each cluster. We can resize using the jitter amount slider. The blue dots represents Cluster 0, green dots for Cluster 1 and red dots for Cluster 2.

Comparison using Number of Bins

Here we can see the differences by using different bins values. That is the numbers of items which is player per cluster varies for each bin from a different average distance. As we can see that by using number of bins 2 for Cluster 0 there are 11 items, Cluster 1 with 4 items while Cluster 2 with 3 items. Due to the expansion in the value range of each attribute, when using number of bin 3 in the Cluster 0 there are 9 items, 5 items in Cluster 1 and 4 items in Cluster 2.

Based on the heatmap result, we can see that by using different number of bins will result to different shades of colors and also percentage scales of the attribute. For number of bins 2, in the Cluster 0 practically all player with pink shades or known as low values for each attributes same goes to Cluster 0 when using number of bins 3.

In Cluster 1, using number of bins 2 there are greenish shades for attributes BLK, FG(TOT ATT, MADE and REB, STL. While a bit different from Cluster 1 when using number of bins equal to 3 because the performance of BLK is turned to pink shades and while the other attributes such as FG(TOT ATT and REB seems to be in green darker shades with high percentage.

Next, for Cluster 2 there are also changes due to different number of bins, as we can see in number of bin 3, the darker green shades attribute is BLK and STL with both range. However FG(TOT ATT turned to pink shades in number of bins 3.

Page updated

Google Sites

Report abuse