Quicklinks
While K-means clustering assigns data points based on their proximity to a centroid, DBScan examines the concentration of data points. A cluster in DBScan is defined as a region with a high concentration of data points surrounded by a region of low concentration. The parameter input into DBScan is epsilon which is a measure of the distance a data point may be from the cluster and still considered part of the cluster.
The workflow for this portion of the project draws heavily from the following:
Import libraries
Setup inputs and lists
Data preparation and application of the elbow method
Import libraries
Setup inputs and lists
Data preparation and application of the elbow method
The following values for epsilon were determined by computing the point of greatest curvature of the graph of distances for a particular dataset.
Legend
1Shape (blue)
2 Shape (orange)
1306 expression data (green)
88069 expression data (red)
Computed epsilons
1Shape: 29
2Shape: 29
1306 expression: 4
88069 expression: 8
Based on DBScan each of the datasets contain only one cluster with a few outliers.
silhouette coefficient: 0.448
silhouette coefficient: 0.489
silhouette coefficient: 0.222
silhouette coefficient: 0.266