Module 5: Clustering
गुरुर्ब्रह्मा गुरुर्विष्णु गुरुर्देवो महेश्वरा गुरुर्साक्षात परब्रह्म तस्मै श्री गुरवे नमः !
Module 5: Clustering
Question 1 The objective of k-means clustering is:
Yield the highest out of sample accuracy
Separate dissimilar samples and group similar ones
Minimize the cost function via gradient descent
Maximize the number of correctly classified data points
Question 2 Which option correctly orders the steps of k-means clustering?
Re-cluster the data points
Choose k random observations to calculate each cluster’s mean
Update centroid to take cluster mean
Repeat until centroids are constant
Calculate data point distance to centroids
2, 5, 3, 1, 4
2, 1, 4, 5, 3
3, 5, 1, 4, 2
2, 3, 4, 5, 1
Question 3 How can we gauge the performance of a k-means clustering model when ground truth is not available?
Calculate the R-squared value to measure model fit.
Take the average of the distance between data points and their cluster centroids.
Calculate the number of incorrectly classified observations in the training set.
Determine the prediction accuracy on the test set.
Question 4 When the parameter K for k-means clustering increases, what happens to the error?
It will decrease because the data points are less possible to be in the wrong cluster.
It will increase because incorrectly classified points are further from the correct centroid.
It might increase or decrease depending on if data points are closer to the centroid.
It will decrease because distance between data points and centroid will decrease.
Question 5 Which of the following is true for partition-based clustering but not hierarchical nor density-based clustering algorithms?
Partition-based clustering produces sphere-like clusters.
Partition-based clustering can handle spatial clusters and noisy data.
Partition-based clustering is a type of unsupervised learning algorithm.
Partition-based clustering produces arbitrary shaped clusters.