In this step, we are using StudentEvent dataset from local repository. The value for this dataset has been standardized. This dataset contains 35 rows with 11 columns.
Create an empty process and drag the StudentEvent dataset into the blank process. It will create a Retrieve operator.
Choose Select Attribute operator to select and determine the attribute to be analyzed. Connect Retrieve operator with Select Attribute operator and select the attributes. In this analysis, we do not analyze Marks and MarksBin attributes.
Choose subset and invert selection option.
Select the unwanted attributes to the right side.
Choose Select Operator and connect it with Select Attribute operator. Use the Select Role operator to set Student ID as id (identifier) and Grade as label. By doing this, these two attributes are excluded from the analysis as a feature.
Set Grade as label.
Set Student ID as id.
To apply K-Means in Rapidminer, it uses the Cluster operator. Connect Set Role operator with Cluster operator. This Cluster operator performs clustering using the k-means algorithm. Clustering groups Examples together that are similar to each other. For this operator, we set the k-value to 6.
The K-Means visualizations can be visualized using Cluster Model Visualizer. This operator uses visualization tools for centroid-based cluster models to capture the essential details of each cluster such as:
Overview: shows the size of all found clusters, together with some information about the clusters and their quality.
Heat map:: displays a decision tree describing the main difference between the clusters.
Centroid Chart: shows the values for the cluster centroids in a parallel chart.
Centroid table: shows the values for the cluster centroids in a table.
Scatter plot: with a choice of the cluster, displays a scatter plot in terms of the two most important Attributes.
Connect Cluster operator with Cluster Model Visualizer operator and connect the visualizer with the result point to complete the K-Means model in Rapidminer.
This is the overview for K-Means Visualization for k=6. The biggest cluster is Cluster 0 with 22 members followed by Cluster 2 with 8 members. From this overview, we can see that Forum, Quiz, and LectureNote is the most event that students like to access as shown in Cluster 0. While for Cluster 8 we can see that Assignment, LectureNote, and Activity are the most access events for Cluster 2.
Member for each cluster