Unit 4 Unsupervised Learning : Clustering

🌐 Unit 4 – Unsupervised Learning

Unsupervised learning deals with unlabeled data.

The model tries to discover patterns or structures in the data without knowing the output.

1️⃣ K-Means Clustering

Definition:
A method that groups data into K clusters based on similarity.

Steps:

Choose K centroids.
Assign points to the nearest centroid.
Recompute centroids and repeat.

Code Example:

from sklearn.cluster import KMeans

from sklearn.datasets import load_iris

X, _ = load_iris(return_X_y=True)

kmeans = KMeans(n_clusters=3, n_init=10)

kmeans.fit(X)

print("Cluster Centers:\n", kmeans.cluster_centers_)

Output:

Cluster Centers:

[[5.9 2.7 4.2 1.3]

[6.8 3.0 5.6 2.1]

[5.0 3.4 1.4 0.2]]

2️⃣ Hierarchical Clustering

Definition:
Builds a hierarchy of clusters using either agglomerative or divisive methods.

Diagram (Dendrogram):

Code Example:

from scipy.cluster.hierarchy import linkage, dendrogram

import matplotlib.pyplot as plt

from sklearn.datasets import load_iris

X, _ = load_iris(return_X_y=True)

Z = linkage(X, method='ward')

dendrogram(Z)

plt.show()

3️⃣ Clustering Metrics

Inertia:
Measures how internally coherent clusters are (lower = better).

Silhouette Score:
Measures how similar a point is to its own cluster vs other clusters (higher = better).

Code Example:

from sklearn.metrics import silhouette_score

score = silhouette_score(X, kmeans.labels_)

print("Silhouette Score:", score)

Output:

Silhouette Score: 0.56

4️⃣ Davies–Bouldin Index

Definition:
Evaluates clustering performance based on intra-cluster similarity and inter-cluster separation.
(Lower value = better clustering.)

Code Example:

from sklearn.metrics import davies_bouldin_score

db_score = davies_bouldin_score(X, kmeans.labels_)

print("Davies-Bouldin Index:", db_score)

Output:

Davies-Bouldin Index: 0.65

5️⃣ Introduction to Artificial Neural Networks (ANN)

Definition:
A model inspired by the human brain that consists of neurons (nodes) arranged in layers.

Diagram:

Input --> [Hidden Layer] --> Output

x1 ----> O ----\

x2 ----> O -----> O ----> y

x3 ----> O ----/

Code Example:

from sklearn.neural_network import MLPClassifier

from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)

ann = MLPClassifier(hidden_layer_sizes=(5,), max_iter=1000)

ann.fit(X, y)

print("Accuracy:", ann.score(X, y))

Output:

Accuracy: 0.97

6️⃣ Perceptron Model

Definition:
A simple neural network with a single neuron that performs binary classification.

Equation:

�=�(∑��+�)

Code Example:

from sklearn.linear_model import Perceptron

from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)

p = Perceptron()

p.fit(X, y)

print("Accuracy:", p.score(X, y))

Output:

Accuracy: 0.90

7️⃣ Limitations of Unsupervised Learning

No clear accuracy measure (no labeled data).
Hard to interpret results.
Sensitive to scaling and initialization.
Requires domain knowledge to validate clusters.

🏁 Conclusion

Supervised Learning helps when you have labeled data — it’s great for prediction and classification.
Unsupervised Learning is useful when you want to explore and understand hidden patterns in unlabeled data.

Both are fundamental to understanding modern AI and data science applications.

Page updated

Report abuse