K-Means Clustering (Python)

K-Means Clustering In Python

For our project, we use k=6.

Introduction To Dataset

In this step, we are using StudentEvent dataset. The value for this dataset has been standardized in Rapidminer.

Import Dataset

path1 = "/content/drive/My Drive/Colab Notebooks/StudentEvent.xlsx"

dataf1 = pd.read_excel(path1)

dataf1.head(3)

Select Data From Dataset

Select data to be analyzed in this activity.

data_std = new_df[['Assignment','Forum','Activity','LectureNote',

'Tutorial','Questionnaire','Quiz','MarksBin']].copy()

scaled_data = data_std

scaled_data

K-Means Algorithm

Import Library

import matplotlib.pyplot as plt

from sklearn.cluster import KMeans

Create New Dataset For K-Means

data_km = new_df[['Assignment','Forum','Activity','LectureNote',

'Tutorial','Questionnaire','Quiz','MarksBin']].copy()

data_km.info()

Initialize X and Y value

This is how we initialized x and y value.

xkm = data_km.iloc[:, [0, 1, 2, 3,4,5,6]].values

ykm = data_km.iloc[:, [7]].values

xkm_label = data_km.iloc[:, [0, 1, 2, 3,4,5,6]].columns

ykm_label = data_km.iloc[:, [7]].columns

Visualized The K-Means

This is how the x value is plotting to visualized the K-Means Clustering.

#Visualising the clusters

plt.scatter(xkm[y_kmeans == 0, 0], xkm[y_kmeans == 0, 1], s = 100, c = 'yellow', label = 'Cluster 0')

plt.scatter(xkm[y_kmeans == 1, 0], xkm[y_kmeans == 1, 1], s = 100, c = 'green', label = 'Cluster 1')

plt.scatter(xkm[y_kmeans == 2, 0], xkm[y_kmeans == 2, 1], s = 100, c = 'cyan', label = 'Cluster 2')

plt.scatter(xkm[y_kmeans == 3, 0], xkm[y_kmeans == 3, 1], s = 100, c = 'grey', label = 'Cluster 3')

plt.scatter(xkm[y_kmeans == 4, 0], xkm[y_kmeans == 4, 1], s = 100, c = 'black', label = 'Cluster 4')

plt.scatter(xkm[y_kmeans == 5, 0], xkm[y_kmeans == 5, 1], s = 100, c = 'blue', label = 'Cluster 5')

plt.legend()

K-Means Accuray

from sklearn.manifold import TSNE

# Project the data: this step will take several seconds

tsne = TSNE(n_components=2, init='random', random_state=0)

digits_proj = tsne.fit_transform(xkm)

# Compute the clusters

kmeans = KMeans(n_clusters=6, random_state=0)

clusters = kmeans.fit_predict(ykm)

# Permute the labels

labels = np.zeros_like(clusters)

for i in range(10):

mask = (clusters == i)

labels[mask] = mode(newy[mask])[0]

# Compute the accuracy

accuracy_score(ykm, labels)

0.9428571428571428

Summary

This is how we visualized the silhouette using the same x value. From the visualization, the suggested k-value is same with the elbow method which is 6.

Next Topic: Agglomerative Clustering In Rapidminer

Page updated

Report abuse