Launching H2O cluster on different port in pysparkling

In this example we will launch H2O machine learning cluster using pysparkling package. You can visit my github and this article to learn more about the code execution explained in this article. For you would need to  install pysparkling in python 2.7 setup as below: > pip install -U h2o_pysparkling_2.1 Now we can launch the […]

Continue reading


Reading nested parquet file in Scala and exporting to CSV

Recently we were working on a problem where the parquet compressed file had lots of nested tables and some of the tables had columns with array type and our objective was to read it and save it to CSV. We wrote a script in Scala which does the following Handles nested parquet compressed content Look […]

Continue reading


Binomial classification example in Scala and GBM with H2O

Here is a sample for binomial classification problem using H2O GBM algorithm using Credit Card data set in Scala language. The following sample is for multinomial classification problem. This sample is created using Spark 2.1.0 with Sparkling Water 2.1.4. import org.apache.spark.h2o._ import water.support.SparkContextSupport.addFiles import org.apache.spark.SparkFiles import java.io.File import water.support.{H2OFrameSupport, SparkContextSupport, ModelMetricsSupport} import water.Key import _root_.hex.glm.GLMModel import _root_.hex.ModelMetricsBinomial val hc […]

Continue reading


Multinomial classification example in Scala and Deep Learning with H2O

Here is a sample for multinomial classification problem using H2O Deep Learning algorithm and iris data set in Scala language. The following sample is for multinomial classification problem. This sample is created using Spark 2.1.0 with Sparkling Water 2.1.4. import org.apache.spark.h2o._ import water.support.SparkContextSupport.addFiles import org.apache.spark.SparkFiles import java.io.File import water.support.{H2OFrameSupport, SparkContextSupport, ModelMetricsSupport} import water.Key import _root_.hex.deeplearning.DeepLearningModel import […]

Continue reading


Tips building H2O and Deep Water source code

Get source code: H2O-3:           $ git clone https://github.com/h2oai/h2o-3.git DeepWater:  $ git clone https://github.com/h2oai/deepwater Building Source code without test: Build the source code without tests (For both H2O-3 and DeepWater source) $ ./gradlew build -x test Build the Java developer version of source code  without tests (For both H2O-3 and DeepWater source) […]

Continue reading


Running python and pysparkling with Zeppelin and YARN on Hadoop

Apache Zeppelin is very useful to use cell based notebooks (similar to jupyter) to work with various applications i.e. spark, python, hive, hbase etc by using various interpreters. With H2O and Sparkling Water you can use Zeppelin on Hadoop cluster with YARN, and then could use Python or Pysparkling to submit jobs. Here are the […]

Continue reading


Using Kyro library with Sparkling Water

Start sparkling water from spark shell: $ bin/sparkling-shell Start sparkling water from spark shell and add 3rd party jar: $ bin/sparkling-shell –jars kryo-4.0.0.jar Note: Make sure the jar file is accessible from this path Verify if jar is added: scala> sc.listJars res1: Seq[String] = ArrayBuffer(spark://172.16.2.123:53523/jars/kryo-4.0.0.jar) Access spark web Interface: Verify on Spark UI that environment […]

Continue reading


Sparkling Water 2.0 Walkthrough with pysparkling

My ENV: SPARK_HOME=/Users/avkashchauhan/tools/spark-2.0.1-bin-hadoop2.6 H2O_HOME=/Users/avkashchauhan/src/github.com/h2oai/h2o-3 MASTER=local-cluster[3,2,1024] Pysparkling Command: $$> bin/pysparkling –num-executors 2 –executor-memory 2g –driver-memory 2g –conf spark.dynamicAllocation.enabled=false Python 2.7.10 (default, Jul 30 2016, 18:31:42) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin Type “help”, “copyright”, “credits” or “license” for more information. Using Spark’s default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to “WARN”. To […]

Continue reading