Renaming H2O data frame column name in R

Following is the code snippet showing how you can rename a column in H2O data frame in R: > train.hex <- h2o.importFile(“https://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv”)   |======================================================| 100% > train.hex   sepal_len sepal_wid petal_len petal_wid class 1 5.1 3.5 1.4 0.2 Iris-setosa 2 4.9 3.0 1.4 0.2 Iris-setosa 3 4.7 3.2 1.3 0.2 Iris-setosa 4 4.6 3.1 1.5 […]

Continue reading


Setting stopping criteria into H2O K-means

Sometime you may be looking for k-means stopping criteria, based off of “Number of Reassigned Observations Within Cluster”. H2O K-means implementation has following 2 stopping criteria in k-means: Outer loop for estimate_k – stop when relative reduction of sum-of-within-centroid-sum-of-squares is small enough lloyds iteration – stops when relative fraction of reassigned points is small enough In […]

Continue reading


Calculating Standard Deviation using custom UDF and group by in H2O

Here is the full code to calculate standard deviation using H2O group by method as well as using customer UDF: library(h2o) h2o.init() irisPath <- system.file(“extdata”, “iris_wheader.csv”, package = “h2o”) iris.hex <- h2o.uploadFile(path = irisPath, destination_frame = “iris.hex”) # Calculating Standard Deviation using h2o group by SdValue <- h2o.group_by(data = iris.hex, by = “class”, sd(“sepal_len”)) # […]

Continue reading


Calculate mean using UDF in H2O

Here is the full code to write a UDF to calculate mean for a given data frame using H2O machine learning platform:   library(h2o) h2o.init() ausPath <- system.file(“extdata”, “australia.csv”, package=”h2o”) australia.hex <- h2o.uploadFile(path = ausPath) # Writing the UDF myMeanUDF = function(Fr) { mean(Fr[, 1]) } # Applying UDF using ddply MeanValue = h2o.ddply(australia.hex[, c(“premax”, […]

Continue reading


Anomaly Detection with Deep Learning in R with H2O

The following R script downloads ECG dataset (training and validation) from internet and perform deep learning based anomaly detection on it. library(h2o) h2o.init() # Import ECG train and test data into the H2O cluster train_ecg <- h2o.importFile( path = “http://h2o-public-test-data.s3.amazonaws.com/smalldata/anomaly/ecg_discord_train.csv”, header = FALSE, sep = “,”) test_ecg <- h2o.importFile( path = “http://h2o-public-test-data.s3.amazonaws.com/smalldata/anomaly/ecg_discord_test.csv”, header = FALSE, […]

Continue reading


Spark with H2O using rsparkling and sparklyr in R

You must have installed: sparklyr rsparkling   Here is the working script: library(sparklyr) > options(rsparkling.sparklingwater.version = “2.1.6”) > Sys.setenv(SPARK_HOME=’/Users/avkashchauhan/tools/spark-2.1.0-bin-hadoop2.6′) > library(rsparkling) > spark_disconnect(sc) > sc <- spark_connect(master = “local”, version = “2.1.0”) Testing the spark context: sc $master [1] “local[8]” $method [1] “shell” $app_name [1] “sparklyr” $config $config$sparklyr.cores.local [1] 8 $config$spark.sql.shuffle.partitions.local [1] 8 $config$spark.env.SPARK_LOCAL_IP.local [1] […]

Continue reading


Building GBM model in R and exporting POJO and MOJO model

Get the dataset: Training: http://h2o-training.s3.amazonaws.com/pums2013/adult_2013_train.csv.gz Test: http://h2o-training.s3.amazonaws.com/pums2013/adult_2013_test.csv.gz Here is the script to build GBM grid model and export MOJO model: library(h2o) h2o.init() # Importing Dataset trainfile <- file.path(“/Users/avkashchauhan/learn/adult_2013_train.csv.gz”) adult_2013_train <- h2o.importFile(trainfile, destination_frame = “adult_2013_train”) testfile <- file.path(“/Users/avkashchauhan/learn/adult_2013_test.csv.gz”) adult_2013_test <- h2o.importFile(testfile, destination_frame = “adult_2013_test”) # Display Dataset adult_2013_train adult_2013_test # Feature Engineering actual_log_wagp <- h2o.assign(adult_2013_test[, “LOG_WAGP”], […]

Continue reading


Saving H2O models from R/Python API in Hadoop Environment

When you are using H2O in clustered environment i.e. Hadoop the machine could be different where h2o.savemodel() is trying to write the model and thats why you see the error “No such file or directory”. If you just give the path i.e. /tmp and visit the machine ID where H2O connection is initiated from R, you […]

Continue reading


RNN in TensorFlow in Python&R, with MNIST

Thought it is more convenient to conduct TensorFlow framework in python, we also talked about how to imply Tensorflow in R here:https://charleshsliao.wordpress.com/tag/tensorflow/ We will talk about how to apply Recurrent neural network in TensorFlow on both of python and R. in R: #1. We load the data library(tensorflow) mnist<-tf$contrib$learn$datasets$mnist$load_mnist(train_dir = “MNIST-data”) #2.Identify Essential Parameters Input<-28L […]

Continue reading


Using H2O models into Java for scoring or prediction

This sample generate a GBM model from R H2O library and then consume the model into Java for prediction. Here is R Script to generate sample model using H2O setwd(“/tmp/resources/”) library(h2o) h2o.init() df = iris h2o_df = as.h2o(df) y = “Species” x = c(“Sepal.Length”, “Sepal.Width”, “Petal.Length”, “Petal.Width”) model = h2o.gbm(y = y, x = x, […]

Continue reading