mikelubinsky - Clustering

Clustering

K-means clustering:

https://saravananthirumuruganathan.wordpress.com/2010/01/27/k-means-clustering-algorithm/

http://stackabuse.com/k-nearest-neighbors-algorithm-in-python-and-scikit-learn/

initial selection of centroids; assign points to centroids

Loop:

create new centroids based on data points

if difference between new centroids and current centroids < delta: break

map():

input: all Clusters_Ids, all Points

output: (ClusterID:Point) nearest center

combine():

output: ClusterID: sum of of distance from Points to ClusterID

reduce():

generate new centroids: ClusterId:newCenter

k-NN classification nonparamatric regression estimator

http://andrew.gibiansky.com/blog/machine-learning/k-nearest-neighbors-simplest-machine-learning/

the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors

In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test point) is classified by assigning the label which is most frequent among the k training samples nearest to that query point.

Hierarhical clustering

WHILE it is not time to stop DO

pick the best two clusters to merge;

combine

http://varianceexplained.org/r/kmeans-free-lunch/

https://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html

http://www.bigdatanews.com/profiles/blogs/fast-clustering-algorithms-for-massive-datasets

http://grigory.us/blog/mapreduce-clustering/

http://www.galvanize.com/blog/introduction-k-means-cluster-analysis/#.Vk_C0xFViko

https://www.analyticsvidhya.com/blog/2017/02/test-data-scientist-clustering/

https://www.youtube.com/watch?v=aiJ8II94qck K-mean clustering

https://www.datascience.com/blog/introduction-to-k-means-clustering-algorithm-learn-data-science-tutorials

https://news.ycombinator.com/item?id=13126711