Setting stopping criteria into H2O K-means

Sometime you may be looking for k-means stopping criteria, based off of “Number of Reassigned Observations Within Cluster”. H2O K-means implementation has following 2 stopping criteria in k-means: Outer loop for estimate_k – stop when relative reduction of sum-of-within-centroid-sum-of-squares is small enough lloyds iteration – stops when relative fraction of reassigned points is small enough In […]

Continue reading


Handling YARN resources manager issue with decommissioned nodes

If you hit the following exception with your YARN resource manager: ERROR/Exception: 17/07/31 15:06:13 WARN retry.RetryInvocationHandler: Exception while invoking class org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes over rm1. Not retrying because try once and fail.java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl Troubleshooting: Please try running the following command and you will see the exact same exception: $ yarn node -list […]

Continue reading


Scoring H2O model with TIBCO StreamBase

If you are using H2O models with StreamBase for scoring this is what you have to do: Get the Model as Java Code (POJO Model) Get the h2o-genmodel.jar (Download from the H2O cluster) Alternatively you can use the REST api (works in every H2O version) as below to download h2o-genmodel.jar: curl http://localhost:54321/3/h2o-genmodel.jar > h2o-genmodel.jar Create […]

Continue reading


Adding hyper parameter to Deep Learning algorithm in H2O with Scala

Hidden layer is hyper parameter for Deep Learning algorithm in H2O and to use hidden layer setting in H2O based deep learning you should be using “_hidden” parameter to specify the hidden later as hyper parameter as below: val hyperParms = collection.immutable.HashMap(“_hidden” -> hidden_layers) Here is the code snippet in Scala to add hidden layers […]

Continue reading


Using Cross-validation in Scala with H2O and getting each cross-validated model

Here is Scala code for binomial classification with GLM: https://aichamp.wordpress.com/2017/04/23/binomial-classification-example-in-scala-and-gbm-with-h2o/ To add cross validation you can do the following: def buildGLMModel(train: Frame, valid: Frame, response: String) (implicit h2oContext: H2OContext): GLMModel = { import _root_.hex.glm.GLMModel.GLMParameters.Family import _root_.hex.glm.GLM import _root_.hex.glm.GLMModel.GLMParameters val glmParams = new GLMParameters(Family.binomial) glmParams._train = train glmParams._valid = valid glmParams._nfolds = 3 ###### Here is […]

Continue reading


Generating ROC curve in SCALA from H2O binary classification models

You can use the following blog to built a binomial classification  GLM model: https://aichamp.wordpress.com/2017/04/23/binomial-classification-example-in-scala-and-gbm-with-h2o/ To collect model metrics  for training use the following: val trainMetrics = ModelMetricsSupport.modelMetrics[ModelMetricsBinomial](glmModel, train) Now you can access model AUC (_auc object) as below: Note: _auc object has array of thresholds, and then for each threshold it has fps and tps (use […]

Continue reading


Saving H2O models from R/Python API in Hadoop Environment

When you are using H2O in clustered environment i.e. Hadoop the machine could be different where h2o.savemodel() is trying to write the model and thats why you see the error “No such file or directory”. If you just give the path i.e. /tmp and visit the machine ID where H2O connection is initiated from R, you […]

Continue reading


Using H2O models into Java for scoring or prediction

This sample generate a GBM model from R H2O library and then consume the model into Java for prediction. Here is R Script to generate sample model using H2O setwd(“/tmp/resources/”) library(h2o) h2o.init() df = iris h2o_df = as.h2o(df) y = “Species” x = c(“Sepal.Length”, “Sepal.Width”, “Petal.Length”, “Petal.Width”) model = h2o.gbm(y = y, x = x, […]

Continue reading


JSON

JSON (/ˈdʒeɪsɒn/ JAY-sawn, /ˈdʒeɪsən/ JAY-sun), or JavaScript Object Notation, is a text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects. Despite its relationship to JavaScript, it is language-independent, with parsers available for many languages. The JSON format was […]

Continue reading


Eclipse in ubuntu

Eclipse is a multi-language Integrated development environment (IDE) comprising a base workspace and an extensible plug-in system for customizing the environment. It is written mostly in Java. It can be used to develop applications in Java and, by means of various plug-ins, other programming languages including Ada, C, C++, COBOL, Fortran, Haskell, JavaScript, Perl, PHP, […]

Continue reading