Spark with H2O using rsparkling and sparklyr in R

Share on Facebook0Share on Google+0Tweet about this on TwitterShare on LinkedIn0

You must have installed:

  • sparklyr
  • rsparkling

 

Here is the working script:

library(sparklyr)
> options(rsparkling.sparklingwater.version = “2.1.6”)
> Sys.setenv(SPARK_HOME=’/Users/avkashchauhan/tools/spark-2.1.0-bin-hadoop2.6′)
> library(rsparkling)
> spark_disconnect(sc)
> sc <- spark_connect(master = “local”, version = “2.1.0”)

Testing the spark context:

sc
$master
[1] “local[8]”

$method
[1] “shell”

$app_name
[1] “sparklyr”

$config
$config$sparklyr.cores.local
[1] 8

$config$spark.sql.shuffle.partitions.local
[1] 8

$config$spark.env.SPARK_LOCAL_IP.local
[1] “127.0.0.1”

$config$sparklyr.csv.embedded
[1] “^1.*”

$config$`sparklyr.shell.driver-class-path`
[1] “”

attr(,”config”)
[1] “default”
attr(,”file”)
[1] “/Library/Frameworks/R.framework/Versions/3.4/Resources/library/sparklyr/conf/config-template.yml”

$spark_home
[1] “/Volumes/OSxexT/tools/spark-2.1.0-bin-hadoop2.6”

$backend
A connection with
description “->localhost:53374”
class “sockconn”
mode “wb”
text “binary”
opened “opened”
can read “yes”
can write “yes”

$monitor
A connection with
description “->localhost:8880”
class “sockconn”
mode “rb”
text “binary”
opened “opened”
can read “yes”
can write “yes”

$output_file
[1] “/var/folders/x7/331tvwcd6p17jj9zdmhnkpyc0000gn/T//RtmpIIVL8I/file6ba7b454325_spark.log”

$spark_context
<jobj[5]>
class org.apache.spark.SparkContext
org.apache.spark.SparkContext@159ba51c

$java_context
<jobj[6]>
class org.apache.spark.api.java.JavaSparkContext
org.apache.spark.api.java.JavaSparkContext@6b114a2d

$hive_context
<jobj[9]>
class org.apache.spark.sql.SparkSession
org.apache.spark.sql.SparkSession@2cd7fdf8

attr(,”class”)
[1] “spark_connection” “spark_shell_connection” “DBIConnection”
> h2o_context(sc, st)
Error in is.H2OFrame(x) : object ‘st’ not found
> h2o_context(sc, strict_version_check = FALSE)
<jobj[15]>
class org.apache.spark.h2o.H2OContext

Sparkling Water Context:
* H2O name: sparkling-water-avkashchauhan_1672148412
* cluster size: 1
* list of used nodes:
(executorId, host, port)
————————
(driver,127.0.0.1,54321)
————————

Open H2O Flow in browser: http://127.0.0.1:54321 (CMD + click in Mac OSX)

You can use the following command to launch H2O FLOW:

h2o_flow(sc, strict_version_check = FALSE)

Thats it, enjoy!!

 

Advertisements

Share on Facebook0Share on Google+0Tweet about this on TwitterShare on LinkedIn0