Setting stopping criteria into H2O K-means

Share on Facebook0Share on Google+0Tweet about this on TwitterShare on LinkedIn0

Sometime you may be looking for k-means stopping criteria, based off of “Number of Reassigned Observations Within Cluster”.

H2O K-means implementation has following 2 stopping criteria in k-means:

  1. Outer loop for estimate_k – stop when relative reduction of sum-of-within-centroid-sum-of-squares is small enough
  2. lloyds iteration – stops when relative fraction of reassigned points is small enough
In H2O Machine Learning library you just need to enabled _estimate_k to True and then have _max_iterations set to a very high number i.e. 100.
Using this combination, what happens is that algorithm will find best suitable K until it hits the max. There are no other fine-tuning parameters available.

In R here is what you can do:

h2o.kmeans(x = predictors, k = 100, estimate_k = T, standardize = F,
                          training_frame = train, validation_frame=valid, seed = 1234)

In Python here is what you can do:

iris_kmeans = H2OKMeansEstimator(k = 100, estimate_k = True, standardize = False, seed = 1234)
iris_kmeans.train(x = predictors, training_frame = train, validation_frame=valid)

In Java/Scala:

_estimate_k  = TRUE
_max_iterations = 100 (or a larger number.)

That’s it, enjoy!!

Advertisements

Share on Facebook0Share on Google+0Tweet about this on TwitterShare on LinkedIn0