Spark with H2O using rsparkling and sparklyr in R

You must have installed: sparklyr rsparkling   Here is the working script: library(sparklyr) > options(rsparkling.sparklingwater.version = “2.1.6”) > Sys.setenv(SPARK_HOME=’/Users/avkashchauhan/tools/spark-2.1.0-bin-hadoop2.6′) > library(rsparkling) > spark_disconnect(sc) > sc <- spark_connect(master = “local”, version = “2.1.0”) Testing the spark context: sc $master [1] “local[8]” $method [1] “shell” $app_name [1] “sparklyr” $config $config$sparklyr.cores.local [1] 8 $config$spark.sql.shuffle.partitions.local [1] 8 $config$spark.env.SPARK_LOCAL_IP.local [1] […]

Continue reading


Building GBM model in R and exporting POJO and MOJO model

Get the dataset: Training: http://h2o-training.s3.amazonaws.com/pums2013/adult_2013_train.csv.gz Test: http://h2o-training.s3.amazonaws.com/pums2013/adult_2013_test.csv.gz Here is the script to build GBM grid model and export MOJO model: library(h2o) h2o.init() # Importing Dataset trainfile <- file.path(“/Users/avkashchauhan/learn/adult_2013_train.csv.gz”) adult_2013_train <- h2o.importFile(trainfile, destination_frame = “adult_2013_train”) testfile <- file.path(“/Users/avkashchauhan/learn/adult_2013_test.csv.gz”) adult_2013_test <- h2o.importFile(testfile, destination_frame = “adult_2013_test”) # Display Dataset adult_2013_train adult_2013_test # Feature Engineering actual_log_wagp <- h2o.assign(adult_2013_test[, “LOG_WAGP”], […]

Continue reading


Saving H2O models from R/Python API in Hadoop Environment

When you are using H2O in clustered environment i.e. Hadoop the machine could be different where h2o.savemodel() is trying to write the model and thats why you see the error “No such file or directory”. If you just give the path i.e. /tmp and visit the machine ID where H2O connection is initiated from R, you […]

Continue reading


RNN in TensorFlow in Python&R, with MNIST

Thought it is more convenient to conduct TensorFlow framework in python, we also talked about how to imply Tensorflow in R here:https://charleshsliao.wordpress.com/tag/tensorflow/ We will talk about how to apply Recurrent neural network in TensorFlow on both of python and R. in R: #1. We load the data library(tensorflow) mnist<-tf$contrib$learn$datasets$mnist$load_mnist(train_dir = “MNIST-data”) #2.Identify Essential Parameters Input<-28L […]

Continue reading


Using H2O models into Java for scoring or prediction

This sample generate a GBM model from R H2O library and then consume the model into Java for prediction. Here is R Script to generate sample model using H2O setwd(“/tmp/resources/”) library(h2o) h2o.init() df = iris h2o_df = as.h2o(df) y = “Species” x = c(“Sepal.Length”, “Sepal.Width”, “Petal.Length”, “Petal.Width”) model = h2o.gbm(y = y, x = x, […]

Continue reading


Using RESTful API to get POJO and MOJO models in H2O

  CURL API for Listing Models: http://<hostname>:<port>/3/Models/ CURL API for Listing specific POJO Model: http://<hostname>:<port>/3/Models/model_name List Specific MOJO Model: http://<hostname>:<port>/3/Models/glm_model/mojo Here is an example: curl -X GET “http://localhost:54323/3/Models” curl -X GET “http://localhost:54323/3/Models/deeplearning_model” >> NAME_IT curl -X GET “http://localhost:54323/3/Models/deeplearning_model” >> dl_model.java curl -X GET “http://localhost:54323/3/Models/glm_model/mojo” > myglm_mojo.zip Thats it, enjoy!! Advertisements

Continue reading


Starter script for rsparkling (H2O on Spark with R)

The rsparkling R package is an extension package for sparklyr that creates an R front-end for the Sparkling WaterSpark package from H2O. This provides an interface to H2O’s high performance, distributed machine learning algorithms on Spark, using R. Visit github project: https://github.com/h2oai/rsparkling You must have the following package installed in your R environment: sparklyr, h2o, rsparkling […]

Continue reading


Recommenders in R, Comparing Multiple Algorithms

We know several essential recommenders’ methods. If we want to recommend ourselves a book, we can do it 1. Based on our own exp 2. Based on our friends friends exp 3. Based on the catalog of the library 4. Based on the search engine’s result We already talked a little about the first method […]

Continue reading


Installing R on Redhat 7 (EC2 RHEL 7)

Check you machine version: $ cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.3 (Maipo) Now  lets updated the RPM repo details: $ sudo su -c ‘rpm -Uvh http://mirror.sfo12.us.leaseweb.net/epel/7/x86_64/e/epel-release-7-9.noarch.rpm’ $ sudo yum update Make sure all dependencies are installed individually: $ wget http://mirror.centos.org/centos/7/os/x86_64/Packages/blas-devel-3.4.2-5.el7.x86_64.rpm $ sudo yum localinstall blas-devel-3.4.2-5.el7.x86_64.rpm $ wget http://mirror.centos.org/centos/7/os/x86_64/Packages/blas-3.4.2-5.el7.x86_64.rpm $ sudo yum localinstall […]

Continue reading


Multicore Data Science with R and Python

This article is an excerpt from the full video on Multicore Data Science in R and Python. Watch the full video to learn how to leverage multicore architectures using R and Python packages. Multicore Data Science in R and Python Time is precious. Data science involves increasingly demanding processing requirements. From training ever larger models, […]

Continue reading