different runs of k-means clustering result in different outputs #9

ghost · 2015-02-25T14:39:36Z

var colors = [
   [97],
   [1],
   [53],
   [79],
   [3],
   [351],
   [16]
];

var clusters = clusterfck.kmeans(colors, 3);

Result A: [1, 3, 16], [53, 79, 97], [351]
Result B: [1, 3, 16, 53], [79, 97], [351]

bbroeksema · 2015-02-27T19:50:32Z

That's normal, kmeans places the initial seeds (cluster centers) randomly. So each run will have a different initial set of seed locations, and as such (slightly) different outcomes. See for a nice introduction to k-means and clustering: http://web.cs.sunyit.edu/~mike/cs542/Jain50YearsBeyondKmeans.pdf

ghost · 2015-03-02T14:32:00Z

Thanks for the literature. However, this behaviour should be explicitly mentioned somewhere, because in other tools (i.e., R, Weka) the default k-means implementation can handle such cases.

Ouwen · 2015-03-02T19:50:38Z

How does R and Weka handle it? Do they use the same random seed for each run?

bbroeksema · 2015-03-03T08:42:39Z

In R you can pass "centers" which is either the number of clusters (which will result in similar undeterministic behavior) or actual initial, distinct, cluster centers (in case, I believe but not actually checked, it will behave deterministic). I don't know about weka.

user24 · 2015-06-10T06:54:35Z

You could modify the kmeans function so instead of saying this.centroids = this.randomCentroids(...) you could pass the centroids in as an argument. That should allow different runs to produce the same results.

tayden · 2016-02-11T00:19:07Z

Often K Means is run multiple times and there is an error measurement calculated as the mean square distance of each point to the cluster centroid to which it belongs. You can then use the clustering result that minimizes this error as your centroids.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

different runs of k-means clustering result in different outputs #9

different runs of k-means clustering result in different outputs #9

ghost commented Feb 25, 2015

bbroeksema commented Feb 27, 2015

ghost commented Mar 2, 2015

Ouwen commented Mar 2, 2015

bbroeksema commented Mar 3, 2015

user24 commented Jun 10, 2015

tayden commented Feb 11, 2016

different runs of k-means clustering result in different outputs #9

different runs of k-means clustering result in different outputs #9

Comments

ghost commented Feb 25, 2015

bbroeksema commented Feb 27, 2015

ghost commented Mar 2, 2015

Ouwen commented Mar 2, 2015

bbroeksema commented Mar 3, 2015

user24 commented Jun 10, 2015

tayden commented Feb 11, 2016