In this step, we are using StudentEvent dataset from local repository. The value for this dataset has been standardized. This dataset contains 35 rows with 11 columns.
Create an empty process and drag the StudentEvent dataset into the blank process. It will create a Retrieve operator.
Choose Select Attribute operator to select and determine the attribute to be analyzed. Connect Retrieve operator with Select Attribute operator and select the attributes. In this analysis, we do not analyze Marks and MarksBin attributes.
Choose subset and invert selection option.
Select the unwanted attributes to the right side.
Choose Select Operator and connect it with Select Attribute operator. Use the Select Role operator to set Student ID as id (identifier) and Grade as label. By doing this, these two attributes are excluded from the analysis as a feature.
Set Grade as label.
Set Student ID as id.
Connect Set Role operator with the Cluster (Agglomerative Clustering) operator. This operator performs Agglomerative clustering which is a bottom-up strategy of Hierarchical clustering.
The Flatten Clustering operator creates a flat cluster model from the given hierarchical cluster model by expanding nodes in the order of their distance until the desired number of clusters (specified by the number of clusters parameter) is reached. For this operator, we set 4 no. of clusters.
This is the view for Agglomerative Clustering Model that has been flattened to 6 clusters. From here we can see that Cluster 1 is the big cluster with 29 members.