Sometime you may be looking for k-means stopping criteria, based off of “Number of Reassigned Observations Within Cluster”.

H2O K-means implementation has following 2 stopping criteria in k-means:

- Outer loop for estimate_k – stop when relative reduction of sum-of-within-centroid-sum-of-squares is small enough
- lloyds iteration – stops when relative fraction of reassigned points is small enough

In H2O Machine Learning library you just need to enabled _estimate_k to True and then have _max_iterations set to a very high number i.e. 100.

Using this combination, what happens is that algorithm will find best suitable K until it hits the max. There are no other fine-tuning parameters available.

In R here is what you can do:

h2o.kmeans(x = predictors, k = 100, estimate_k = T, standardize = F, training_frame = train, validation_frame=valid, seed = 1234)

In Python here is what you can do:

iris_kmeans = H2OKMeansEstimator(k = 100, estimate_k = True, standardize = False, seed = 1234) iris_kmeans.train(x = predictors, training_frame = train, validation_frame=valid)

In Java/Scala:

**_estimate_k**

= TRUE
_max_iterations = 100 (or a larger number.)

That’s it, enjoy!!

Advertisements