Renaming H2O data frame column name in R

Following is the code snippet showing how you can rename a column in H2O data frame in R: > train.hex <- h2o.importFile(“https://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv”)   |======================================================| 100% > train.hex   sepal_len sepal_wid petal_len petal_wid class 1 5.1 3.5 1.4 0.2 Iris-setosa 2 4.9 3.0 1.4 0.2 Iris-setosa 3 4.7 3.2 1.3 0.2 Iris-setosa 4 4.6 3.1 1.5 […]

Continue reading


Setting stopping criteria into H2O K-means

Sometime you may be looking for k-means stopping criteria, based off of “Number of Reassigned Observations Within Cluster”. H2O K-means implementation has following 2 stopping criteria in k-means: Outer loop for estimate_k – stop when relative reduction of sum-of-within-centroid-sum-of-squares is small enough lloyds iteration – stops when relative fraction of reassigned points is small enough In […]

Continue reading


Handling YARN resources manager issue with decommissioned nodes

If you hit the following exception with your YARN resource manager: ERROR/Exception: 17/07/31 15:06:13 WARN retry.RetryInvocationHandler: Exception while invoking class org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes over rm1. Not retrying because try once and fail.java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl Troubleshooting: Please try running the following command and you will see the exact same exception: $ yarn node -list […]

Continue reading


Scoring H2O model with TIBCO StreamBase

If you are using H2O models with StreamBase for scoring this is what you have to do: Get the Model as Java Code (POJO Model) Get the h2o-genmodel.jar (Download from the H2O cluster) Alternatively you can use the REST api (works in every H2O version) as below to download h2o-genmodel.jar: curl http://localhost:54321/3/h2o-genmodel.jar > h2o-genmodel.jar Create […]

Continue reading


Calculating Standard Deviation using custom UDF and group by in H2O

Here is the full code to calculate standard deviation using H2O group by method as well as using customer UDF: library(h2o) h2o.init() irisPath <- system.file(“extdata”, “iris_wheader.csv”, package = “h2o”) iris.hex <- h2o.uploadFile(path = irisPath, destination_frame = “iris.hex”) # Calculating Standard Deviation using h2o group by SdValue <- h2o.group_by(data = iris.hex, by = “class”, sd(“sepal_len”)) # […]

Continue reading


Calculate mean using UDF in H2O

Here is the full code to write a UDF to calculate mean for a given data frame using H2O machine learning platform:   library(h2o) h2o.init() ausPath <- system.file(“extdata”, “australia.csv”, package=”h2o”) australia.hex <- h2o.uploadFile(path = ausPath) # Writing the UDF myMeanUDF = function(Fr) { mean(Fr[, 1]) } # Applying UDF using ddply MeanValue = h2o.ddply(australia.hex[, c(“premax”, […]

Continue reading


Getting individual metrics from H2O model in Python

You can get some of the individual model metrics for your model based on training and/or validation data. Here is the code snippet: Note: I am creating a test data frame to run H2O Deep Learning algorithm and then showing how to collect individual model metrics based on training and/or validation data below. import h2o […]

Continue reading


Adding hyper parameter to Deep Learning algorithm in H2O with Scala

Hidden layer is hyper parameter for Deep Learning algorithm in H2O and to use hidden layer setting in H2O based deep learning you should be using “_hidden” parameter to specify the hidden later as hyper parameter as below: val hyperParms = collection.immutable.HashMap(“_hidden” -> hidden_layers) Here is the code snippet in Scala to add hidden layers […]

Continue reading


Anomaly Detection with Deep Learning in R with H2O

The following R script downloads ECG dataset (training and validation) from internet and perform deep learning based anomaly detection on it. library(h2o) h2o.init() # Import ECG train and test data into the H2O cluster train_ecg <- h2o.importFile( path = “http://h2o-public-test-data.s3.amazonaws.com/smalldata/anomaly/ecg_discord_train.csv”, header = FALSE, sep = “,”) test_ecg <- h2o.importFile( path = “http://h2o-public-test-data.s3.amazonaws.com/smalldata/anomaly/ecg_discord_test.csv”, header = FALSE, […]

Continue reading


Spark with H2O using rsparkling and sparklyr in R

You must have installed: sparklyr rsparkling   Here is the working script: library(sparklyr) > options(rsparkling.sparklingwater.version = “2.1.6”) > Sys.setenv(SPARK_HOME=’/Users/avkashchauhan/tools/spark-2.1.0-bin-hadoop2.6′) > library(rsparkling) > spark_disconnect(sc) > sc <- spark_connect(master = “local”, version = “2.1.0”) Testing the spark context: sc $master [1] “local[8]” $method [1] “shell” $app_name [1] “sparklyr” $config $config$sparklyr.cores.local [1] 8 $config$spark.sql.shuffle.partitions.local [1] 8 $config$spark.env.SPARK_LOCAL_IP.local [1] […]

Continue reading