Saving H2O models from R/Python API in Hadoop Environment

When you are using H2O in clustered environment i.e. Hadoop the machine could be different where h2o.savemodel() is trying to write the model and thats why you see the error “No such file or directory”. If you just give the path i.e. /tmp and visit the machine ID where H2O connection is initiated from R, you […]

Continue reading


RNN in TensorFlow in Python&R, with MNIST

Thought it is more convenient to conduct TensorFlow framework in python, we also talked about how to imply Tensorflow in R here:https://charleshsliao.wordpress.com/tag/tensorflow/ We will talk about how to apply Recurrent neural network in TensorFlow on both of python and R. in R: #1. We load the data library(tensorflow) mnist<-tf$contrib$learn$datasets$mnist$load_mnist(train_dir = “MNIST-data”) #2.Identify Essential Parameters Input<-28L […]

Continue reading


Using H2O models into Java for scoring or prediction

This sample generate a GBM model from R H2O library and then consume the model into Java for prediction. Here is R Script to generate sample model using H2O setwd(“/tmp/resources/”) library(h2o) h2o.init() df = iris h2o_df = as.h2o(df) y = “Species” x = c(“Sepal.Length”, “Sepal.Width”, “Petal.Length”, “Petal.Width”) model = h2o.gbm(y = y, x = x, […]

Continue reading


Using RESTful API to get POJO and MOJO models in H2O

  CURL API for Listing Models: http://<hostname>:<port>/3/Models/ CURL API for Listing specific POJO Model: http://<hostname>:<port>/3/Models/model_name List Specific MOJO Model: http://<hostname>:<port>/3/Models/glm_model/mojo Here is an example: curl -X GET “http://localhost:54323/3/Models” curl -X GET “http://localhost:54323/3/Models/deeplearning_model” >> NAME_IT curl -X GET “http://localhost:54323/3/Models/deeplearning_model” >> dl_model.java curl -X GET “http://localhost:54323/3/Models/glm_model/mojo” > myglm_mojo.zip Thats it, enjoy!! Advertisements

Continue reading


Starter script for rsparkling (H2O on Spark with R)

The rsparkling R package is an extension package for sparklyr that creates an R front-end for the Sparkling WaterSpark package from H2O. This provides an interface to H2O’s high performance, distributed machine learning algorithms on Spark, using R. Visit github project: https://github.com/h2oai/rsparkling You must have the following package installed in your R environment: sparklyr, h2o, rsparkling […]

Continue reading


Recommenders in R, Comparing Multiple Algorithms

We know several essential recommenders’ methods. If we want to recommend ourselves a book, we can do it 1. Based on our own exp 2. Based on our friends friends exp 3. Based on the catalog of the library 4. Based on the search engine’s result We already talked a little about the first method […]

Continue reading


Installing R on Redhat 7 (EC2 RHEL 7)

Check you machine version: $ cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.3 (Maipo) Now  lets updated the RPM repo details: $ sudo su -c ‘rpm -Uvh http://mirror.sfo12.us.leaseweb.net/epel/7/x86_64/e/epel-release-7-9.noarch.rpm’ $ sudo yum update Make sure all dependencies are installed individually: $ wget http://mirror.centos.org/centos/7/os/x86_64/Packages/blas-devel-3.4.2-5.el7.x86_64.rpm $ sudo yum localinstall blas-devel-3.4.2-5.el7.x86_64.rpm $ wget http://mirror.centos.org/centos/7/os/x86_64/Packages/blas-3.4.2-5.el7.x86_64.rpm $ sudo yum localinstall […]

Continue reading


Multicore Data Science with R and Python

This article is an excerpt from the full video on Multicore Data Science in R and Python. Watch the full video to learn how to leverage multicore architectures using R and Python packages. Multicore Data Science in R and Python Time is precious. Data science involves increasingly demanding processing requirements. From training ever larger models, […]

Continue reading


Using R’s ggplot within IPython Notebook

As a Data Scientist I often use Python to write quick scripts to transform/massage data. But for data visualization I love using R’s gggplot. Although there is a version of ggplot written in python, I found it be lacking lot of features as compared to its R’s counterpart. Luckily using IPython Notebook you can have […]

Continue reading


DIVIDING DATA INTO TRAINING AND TESTING IN R

During machine learning one often needs to divide the two different data sets, namely training and testing datasets. While you can’t directly use the “sample” command in R, there is a simple workaround for this. Essentially, use the “sample” command to randomly select certain index number and then use the selected index numbers to divide […]

Continue reading