I've implemented a k means clustering algorithm as described at http://faculty.uscupstate.edu/atzacheva/SHIM450/KMeansExample.doc
For some datasets I find that the number of clusters generated do not always equal initial k. Is this to be expected?
I think it is to be expected as after each iteration each data point is added to the closest cluster, but this does not guarantee that all clusters will be populated. Each cluster is reinitialised after each iteration so some clusters may not be sufficiently close to points to have any point added to them.
This paper describes that empty clusters for k means can occur http://www.academypublisher.com/ijrte/vol01/no01/ijrte0101220226.pdf
One of the major problems of the k-means algorithm is that it may produce empty clusters depending on initial center vectors. For static execution of the k-means, this problem is considered insignificant and can be solved by executing the algorithm for a number of times.
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句