Density-Based Spatial Clustering of Applications with Noise (DBScan)

Finding the optimal value of epsilon

DBScan clustering

Finding the optimal value of epsilon

DBScan clustering

Quicklinks

Finding the optimal value of epsilon

DBScan clustering

Finding the optimal value of epsilon

DBScan clustering

Background

While K-means clustering assigns data points based on their proximity to a centroid, DBScan examines the concentration of data points. A cluster in DBScan is defined as a region with a high concentration of data points surrounded by a region of low concentration. The parameter input into DBScan is epsilon which is a measure of the distance a data point may be from the cluster and still considered part of the cluster.

Learn more about DBScan clustering

Implementation

The workflow for this portion of the project draws heavily from the following:

https://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html#sphx-glr-auto-examples-cluster-plot-dbscan-py

https://www.machinecurve.com/index.php/2020/12/09/performing-dbscan-clustering-with-python-and-scikit-learn/

https://www.coryjmaklin.com/machine-learning-clustering-dbscan-determine-the-optimal-value-for-epsilon-eps-python-example

Finding the optimal value of epsilon

Import libraries

Setup inputs and lists

Data preparation and application of the elbow method

DBScan clustering

Import libraries

Setup inputs and lists

Data preparation and application of the elbow method

Results

Finding the optimal value of epsilon

The following values for epsilon were determined by computing the point of greatest curvature of the graph of distances for a particular dataset.

Distance versus index value

Legend

1Shape (blue)
2 Shape (orange)
1306 expression data (green)
88069 expression data (red)

Computed epsilons

1Shape: 29
2Shape: 29
1306 expression: 4
88069 expression: 8

DBScan clustering

Based on DBScan each of the datasets contain only one cluster with a few outliers.

1306 expression data

silhouette coefficient: 0.448

88069 expression data

silhouette coefficient: 0.489

1Shape data

silhouette coefficient: 0.222

2Shape data

silhouette coefficient: 0.266

Page updated

Report abuse