Clustering
https://en.wikipedia.org/wiki/Cluster_analysis
http://haifengl.github.io/smile/index.html#clustering
http://www.learndatasci.com/k-means-clustering-algorithms-python-intro/
http://johnloeber.com/docs/kmeans.html
https://habrahabr.ru/post/321216/ Affinity propagation
K-means clustering:
https://saravananthirumuruganathan.wordpress.com/2010/01/27/k-means-clustering-algorithm/
http://stackabuse.com/k-nearest-neighbors-algorithm-in-python-and-scikit-learn/
initial selection of centroids; assign points to centroids
Loop:
create new centroids based on data points
if difference between new centroids and current centroids < delta: break
map():
input: all Clusters_Ids, all Points
output: (ClusterID:Point) nearest center
combine():
output: ClusterID: sum of of distance from Points to ClusterID
reduce():
generate new centroids: ClusterId:newCenter
k-NN classification nonparamatric regression estimator
http://andrew.gibiansky.com/blog/machine-learning/k-nearest-neighbors-simplest-machine-learning/
the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors
In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test point) is classified by assigning the label which is most frequent among the k training samples nearest to that query point.
Hierarhical clustering
WHILE it is not time to stop DO
pick the best two clusters to merge;
combine
http://varianceexplained.org/r/kmeans-free-lunch/
https://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html
http://www.bigdatanews.com/profiles/blogs/fast-clustering-algorithms-for-massive-datasets
http://grigory.us/blog/mapreduce-clustering/
http://www.galvanize.com/blog/introduction-k-means-cluster-analysis/#.Vk_C0xFViko
https://www.analyticsvidhya.com/blog/2017/02/test-data-scientist-clustering/
https://www.youtube.com/watch?v=aiJ8II94qck K-mean clustering