Spark with H2O using rsparkling and sparklyr in R

You must have installed: sparklyr rsparkling   Here is the working script: library(sparklyr) > options(rsparkling.sparklingwater.version = “2.1.6”) > Sys.setenv(SPARK_HOME=’/Users/avkashchauhan/tools/spark-2.1.0-bin-hadoop2.6′) > library(rsparkling) > spark_disconnect(sc) > sc <- spark_connect(master = “local”, version = “2.1.0”) Testing the spark context: sc $master [1] “local[8]” $method [1] “shell” $app_name [1] “sparklyr” $config $config$sparklyr.cores.local [1] 8 $config$spark.sql.shuffle.partitions.local [1] 8 $config$spark.env.SPARK_LOCAL_IP.local [1] […]

Continue reading


Generating ROC curve in SCALA from H2O binary classification models

You can use the following blog to built a binomial classification  GLM model: https://aichamp.wordpress.com/2017/04/23/binomial-classification-example-in-scala-and-gbm-with-h2o/ To collect model metrics  for training use the following: val trainMetrics = ModelMetricsSupport.modelMetrics[ModelMetricsBinomial](glmModel, train) Now you can access model AUC (_auc object) as below: Note: _auc object has array of thresholds, and then for each threshold it has fps and tps (use […]

Continue reading


Using RESTful API to get POJO and MOJO models in H2O

  CURL API for Listing Models: http://<hostname>:<port>/3/Models/ CURL API for Listing specific POJO Model: http://<hostname>:<port>/3/Models/model_name List Specific MOJO Model: http://<hostname>:<port>/3/Models/glm_model/mojo Here is an example: curl -X GET “http://localhost:54323/3/Models” curl -X GET “http://localhost:54323/3/Models/deeplearning_model” >> NAME_IT curl -X GET “http://localhost:54323/3/Models/deeplearning_model” >> dl_model.java curl -X GET “http://localhost:54323/3/Models/glm_model/mojo” > myglm_mojo.zip Thats it, enjoy!! Advertisements

Continue reading


Starter script for rsparkling (H2O on Spark with R)

The rsparkling R package is an extension package for sparklyr that creates an R front-end for the Sparkling WaterSpark package from H2O. This provides an interface to H2O’s high performance, distributed machine learning algorithms on Spark, using R. Visit github project: https://github.com/h2oai/rsparkling You must have the following package installed in your R environment: sparklyr, h2o, rsparkling […]

Continue reading


Spark Cluster on Google Compute Engine

What is Spark and Why? Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster […]

Continue reading