Using Sparkling water and PySpark to log console output

Share on Facebook0Share on Google+0Tweet about this on TwitterShare on LinkedIn0

Here is the command Option #1:

./pyspark --deploy-mode client --conf spark.dynamicAllocation.enabled=false --packages com.databricks:spark-csv_2.11:1.4.0 --py-files ../../sparkling-water-1.6.7/py/dist/h2o_pysparkling_1.6-1.6.7-py2.7.egg

Here is the command Option #2:

./pyspark --deploy-mode client --conf spark.dynamicAllocation.enabled=false --packages com.databricks:spark-csv_2.11:1.4.0,ai.h2o:sparkling-water-core_2.10:1.6.7 --py-files ../../sparkling-water-1.6.7/py/dist/h2o_pysparkling_1.6-1.6.7-py2.7.egg

We must make sure that both h2o backend and python version are calling same Version of API.

This parameter is using H2O API backend version 1.6.7
ai.h2o:sparkling-water-core_2.10:1.6.7

This parameter is using 1.6.7 version of Python API:
–py-files /mnt/app/sparkling-water-1.6.7/py/dist/h2o_pysparkling_1.6-1.6.7-py2.7.egg

Here is the script to test overall scenario:

>>> from pysparkling import *
>>> from pyspark import SparkContext
>>> from pyspark.sql import SQLContext
>>> import h2o
>>> sqlContext = SQLContext(sc)
>>> hc = H2OContext.getOrCreate(sc)
Advertisements

Share on Facebook0Share on Google+0Tweet about this on TwitterShare on LinkedIn0