Result & Analysis

Results And Analysis

In this section, we can see all the results and analysis for our project. For Descriptive analytics, we are not comparing the result for K-Means Clustering and Agglomerative Clustering. The performance comparison is only for predictive analytics which is between Decision Tree and Naive Bayes.

Result & Analysis (Descriptive Analytics)

This section will be focusing on the result and analysis produced from the Descriptive Analysis using Rapidminer and Python. We have chosen two clustering algorithms which is K-Means Clustering and Agglomerative Clustering.

K-Means Algorithm

Kmeans algorithm is an iterative algorithm that tries to partition the dataset into Kpre-defined distinct non-overlapping subgroups (clusters) where each data point belongs to only one group. It tries to make the intra-cluster data points as similar as possible while also keeping the clusters as different (far) as possible. It assigns data points to a cluster such that the sum of the squared distance between the data points and the cluster’s centroid (arithmetic mean of all the data points that belong to that cluster) is at the minimum. The less variation we have within clusters, the more homogeneous (similar) data points are within the same cluster.

In this project, we define the k-value using the Elbow Method and Silhouette Method using yellowbrick library in Python. Based on the result, we define our k-value to 6.

K-Means Clustering in Rapidminer

This is the overview for K-Means Clustering in Rapidminer. We set the k-value to 6 as suggested in the Elbow method and the Silhouette Method in Python. The biggest cluster is Cluster 0 with 22 members followed by Cluster 2 with 8 members. From this overview, we can see that Forum, Quiz, and LectureNote is the most event that students like to access as shown in Cluster 0. While for Cluster 8 we can see that Assignment, LectureNote, and Activity are the most access events for Cluster 2.

Rapidminer gives a good solution for us where we can identify the features and members for each cluster. As we can see, Forum, Quiz, and LectureNote are the most event that most students access for this online course. Students access Quiz because usually, it will contribute marks for students. As for the Forum event, it is a suitable place for students to either want to share their knowledge or experience which can be a reference to other students.

K-Means Clustering In Python

This is K-Means Clustering visualization using scatter plot in Python. As we can see, Cluster 0 is the big cluster among other clusters. Cluster 3, 4, and 5 only has only 1 member. While Cluster 1 has 2 members, Cluster 2 has 3 members, and the rest of the members are belonging to Cluster 0.

Agglomerative Clustering In Rapidminer

Below is our output for Agglomerative Clustering in Rapidminer. The left side is Cluster Description which we can see a number of clusters and the right side are the member for each cluster. As we can see, Cluster 1 is the biggest cluster which has 29 members. From the table, we can see that what is the member for each cluster and what is the value for each feature.

Agglomerative Clustering In Python

Below is the output generated in Python. On the left side is the visualization for Agglomerative clustering using dendrogram visualization. While at the right site, we using a scatter plot to visualized Agglomerative clustering.

A dendrogram is a diagram that shows the hierarchical relationship between objects. It is most commonly created as an output from hierarchical clustering. The main use of a dendrogram is to work out the best way to allocate objects to clusters. Based on my research and findings, not like K-Means Clustering, we can use Elbow and Silhouette Method to determine the k-value. The key to interpreting a dendrogram is to focus on the height at which any two objects are joined together.

I choose to draw a line to get 6 clusters same with K-Means to see if Agglomerative Clustering can produce the same or similar result with K-Means Clustering. The dendrogram below shows the hierarchical clustering of six features which are Assignment, Forum, Activity, LectureNote, Tutorial, Questionnaire, and Quiz based on 6 clusters. The x-axis is the predicted value for MarksBin features (1-11). From the scatter plot, we can see that Cluster 2 is the biggest cluster. For Cluster 3, 4, and 6 it only has 1 member while Cluster 1 has 3 members and Cluster 5 has 2 members. Its means that Cluster 2 has a total of 26 members.

Dendrogram Visualization

Scatter Plot Visualization

As for Agglomerative Clustering, even we use the same number of clusters for Rapidminer and Python, it did not produce the same results as we expected.

Result, Analysis & Comparison (Predictive Analytics)

Summary

From the above comparison, we can conclude that the best ratio is 30:70 and the best model is Decision Tree in Rapidminer with the performance accuracy is 100%.

Page updated

Report abuse

Result & Analysis

Results And Analysis

Result & Analysis (Descriptive Analytics)

K-Means Algorithm

K-Means Clustering in Rapidminer

K-Means Clustering In Python

Agglomerative Clustering In Rapidminer

Agglomerative Clustering In Python

Result, Analysis & Comparison (Predictive Analytics)

Decision Tree Before Tuning (30:70)

Rapidminer

Python

Decision Tree Before Tuning (50:50)

Rapidminer

Python

Decision Tree After Tuning (30:70)

Rapidminer

Python

Decision Tree After Tuning (50:50)

Rapidminer

Python

Naive Bayes Before Tuning (30:70)

Rapidminer

Python

Naive Bayes Before Tuning (50:50)

Rapidminer

Naive Bayes After Tuning (30:70)

Rapidminer

Python

Naive Bayes After Tuning (50:50)

Rapidminer

Python

Summary

Copyright by 199607-Build using sites.google.com