Spark with H2O using rsparkling and sparklyr in R

You must have installed: sparklyr rsparkling   Here is the working script: library(sparklyr) > options(rsparkling.sparklingwater.version = “2.1.6”) > Sys.setenv(SPARK_HOME=’/Users/avkashchauhan/tools/spark-2.1.0-bin-hadoop2.6′) > library(rsparkling) > spark_disconnect(sc) > sc <- spark_connect(master = “local”, version = “2.1.0”) Testing the spark context: sc $master [1] “local[8]” $method [1] “shell” $app_name [1] “sparklyr” $config $config$sparklyr.cores.local [1] 8 $config$spark.sql.shuffle.partitions.local [1] 8 $config$spark.env.SPARK_LOCAL_IP.local [1] […]

Continue reading


Building GBM model in R and exporting POJO and MOJO model

Get the dataset: Training: http://h2o-training.s3.amazonaws.com/pums2013/adult_2013_train.csv.gz Test: http://h2o-training.s3.amazonaws.com/pums2013/adult_2013_test.csv.gz Here is the script to build GBM grid model and export MOJO model: library(h2o) h2o.init() # Importing Dataset trainfile <- file.path(“/Users/avkashchauhan/learn/adult_2013_train.csv.gz”) adult_2013_train <- h2o.importFile(trainfile, destination_frame = “adult_2013_train”) testfile <- file.path(“/Users/avkashchauhan/learn/adult_2013_test.csv.gz”) adult_2013_test <- h2o.importFile(testfile, destination_frame = “adult_2013_test”) # Display Dataset adult_2013_train adult_2013_test # Feature Engineering actual_log_wagp <- h2o.assign(adult_2013_test[, “LOG_WAGP”], […]

Continue reading


Experimental Plotting in H2O FLOW (limited support)

H2O FLOW comes with experimental plot option which can be used as below: What you need is : X – Column name Y – Column name Data: Data Frame name Here is what experimental script look like: plot (g) -> g(    g.point(    g.position “X_Column”, “Y_Column” ) g.from inspect “data”, getFrameData “DATA_SET_KEY” ) You […]

Continue reading


Using Cross-validation in Scala with H2O and getting each cross-validated model

Here is Scala code for binomial classification with GLM: https://aichamp.wordpress.com/2017/04/23/binomial-classification-example-in-scala-and-gbm-with-h2o/ To add cross validation you can do the following: def buildGLMModel(train: Frame, valid: Frame, response: String) (implicit h2oContext: H2OContext): GLMModel = { import _root_.hex.glm.GLMModel.GLMParameters.Family import _root_.hex.glm.GLM import _root_.hex.glm.GLMModel.GLMParameters val glmParams = new GLMParameters(Family.binomial) glmParams._train = train glmParams._valid = valid glmParams._nfolds = 3 ###### Here is […]

Continue reading


Generating ROC curve in SCALA from H2O binary classification models

You can use the following blog to built a binomial classification  GLM model: https://aichamp.wordpress.com/2017/04/23/binomial-classification-example-in-scala-and-gbm-with-h2o/ To collect model metrics  for training use the following: val trainMetrics = ModelMetricsSupport.modelMetrics[ModelMetricsBinomial](glmModel, train) Now you can access model AUC (_auc object) as below: Note: _auc object has array of thresholds, and then for each threshold it has fps and tps (use […]

Continue reading


Saving H2O models from R/Python API in Hadoop Environment

When you are using H2O in clustered environment i.e. Hadoop the machine could be different where h2o.savemodel() is trying to write the model and thats why you see the error “No such file or directory”. If you just give the path i.e. /tmp and visit the machine ID where H2O connection is initiated from R, you […]

Continue reading


Using H2O models into Java for scoring or prediction

This sample generate a GBM model from R H2O library and then consume the model into Java for prediction. Here is R Script to generate sample model using H2O setwd(“/tmp/resources/”) library(h2o) h2o.init() df = iris h2o_df = as.h2o(df) y = “Species” x = c(“Sepal.Length”, “Sepal.Width”, “Petal.Length”, “Petal.Width”) model = h2o.gbm(y = y, x = x, […]

Continue reading


Using RESTful API to get POJO and MOJO models in H2O

  CURL API for Listing Models: http://<hostname>:<port>/3/Models/ CURL API for Listing specific POJO Model: http://<hostname>:<port>/3/Models/model_name List Specific MOJO Model: http://<hostname>:<port>/3/Models/glm_model/mojo Here is an example: curl -X GET “http://localhost:54323/3/Models” curl -X GET “http://localhost:54323/3/Models/deeplearning_model” >> NAME_IT curl -X GET “http://localhost:54323/3/Models/deeplearning_model” >> dl_model.java curl -X GET “http://localhost:54323/3/Models/glm_model/mojo” > myglm_mojo.zip Thats it, enjoy!! Advertisements

Continue reading


Starter script for rsparkling (H2O on Spark with R)

The rsparkling R package is an extension package for sparklyr that creates an R front-end for the Sparkling WaterSpark package from H2O. This provides an interface to H2O’s high performance, distributed machine learning algorithms on Spark, using R. Visit github project: https://github.com/h2oai/rsparkling You must have the following package installed in your R environment: sparklyr, h2o, rsparkling […]

Continue reading