Python example of building GLM, GBM and Random Forest Binomial Model with H2O

Here is an example of using H2O machine learning library and then building GLM, GBM and Distributed Random Forest models for categorical response variable. Lets import h2o library and initialize the H2O machine learning cluster: import h2o h2o.init() Importing dataset and getting familiar with it: df = h2o.import_file(“https://raw.githubusercontent.com/h2oai/sparkling-water/master/examples/smalldata/prostate.csv”) df.summary() df.col_names Lets configure our predictors and […]

Continue reading


Visualizing H2O GBM and Random Forest MOJO Models Trees in python

In this example we will build a tree based model first using H2O machine learning library and the save that model as MOJO. Using GraphViz/Dot library we will extract individual trees/cross validated model trees from the MOJO and visualize them. If you are new to H2O MOJO model, learn here. You can also get full […]

Continue reading


Building Regression and Classification GBM models in Scala with H2O

In the full code below you will learn to build H2O GBM model (Regression and binomial classification) in Scala. Lets first import all the classes we need for this project: import org.apache.spark.SparkFiles import org.apache.spark.h2o._ import org.apache.spark.examples.h2o._ import org.apache.spark.sql.{DataFrame, SQLContext} import water.Key import java.io.File import water.support.SparkContextSupport.addFiles import water.support.H2OFrameSupport._ // Create SQL support implicit val sqlContext = […]

Continue reading


Ranking GBM tree based on scoring metrics

Here is the full python code: import h2o import pandas as pd h2o.init() ## Import data df = h2o.import_file(‘/Users/avkashchauhan/airlines_train.csv’) df.shape df.col_names y = “IsDepDelayed” x = df.col_names x.remove(y) print(x) ## Building GBM model from h2o.estimators.gbm import H2OGradientBoostingEstimator gbm_model = H2OGradientBoostingEstimator() gbm_model.train(x = x, y = y, training_frame=df) ## Understanding model print(gbm_model) print(“Total trees in the […]

Continue reading


Building GBM model in R and exporting POJO and MOJO model

Get the dataset: Training: http://h2o-training.s3.amazonaws.com/pums2013/adult_2013_train.csv.gz Test: http://h2o-training.s3.amazonaws.com/pums2013/adult_2013_test.csv.gz Here is the script to build GBM grid model and export MOJO model: library(h2o) h2o.init() # Importing Dataset trainfile <- file.path(“/Users/avkashchauhan/learn/adult_2013_train.csv.gz”) adult_2013_train <- h2o.importFile(trainfile, destination_frame = “adult_2013_train”) testfile <- file.path(“/Users/avkashchauhan/learn/adult_2013_test.csv.gz”) adult_2013_test <- h2o.importFile(testfile, destination_frame = “adult_2013_test”) # Display Dataset adult_2013_train adult_2013_test # Feature Engineering actual_log_wagp <- h2o.assign(adult_2013_test[, “LOG_WAGP”], […]

Continue reading


Using Cross-validation in Scala with H2O and getting each cross-validated model

Here is Scala code for binomial classification with GLM: https://aichamp.wordpress.com/2017/04/23/binomial-classification-example-in-scala-and-gbm-with-h2o/ To add cross validation you can do the following: def buildGLMModel(train: Frame, valid: Frame, response: String) (implicit h2oContext: H2OContext): GLMModel = { import _root_.hex.glm.GLMModel.GLMParameters.Family import _root_.hex.glm.GLM import _root_.hex.glm.GLMModel.GLMParameters val glmParams = new GLMParameters(Family.binomial) glmParams._train = train glmParams._valid = valid glmParams._nfolds = 3 ###### Here is […]

Continue reading


Generating ROC curve in SCALA from H2O binary classification models

You can use the following blog to built a binomial classification  GLM model: https://aichamp.wordpress.com/2017/04/23/binomial-classification-example-in-scala-and-gbm-with-h2o/ To collect model metrics  for training use the following: val trainMetrics = ModelMetricsSupport.modelMetrics[ModelMetricsBinomial](glmModel, train) Now you can access model AUC (_auc object) as below: Note: _auc object has array of thresholds, and then for each threshold it has fps and tps (use […]

Continue reading


Using H2O models into Java for scoring or prediction

This sample generate a GBM model from R H2O library and then consume the model into Java for prediction. Here is R Script to generate sample model using H2O setwd(“/tmp/resources/”) library(h2o) h2o.init() df = iris h2o_df = as.h2o(df) y = “Species” x = c(“Sepal.Length”, “Sepal.Width”, “Petal.Length”, “Petal.Width”) model = h2o.gbm(y = y, x = x, […]

Continue reading


Q&A with Bryan & Miroslaw, 2nd Place in the See Click Predict Fix Competition

What was your background prior to entering this challenge? My professional background is in business intelligence and analytics/reporting and Miroslaw’s background is in mathematics, so neither of us has a formal background in machine learning. However, we have both taken multiple online classes in machine learning topics, including Andrew Ng’s excellent StanfordX Machine Learning course. […]

Continue reading


Binomial classification example in Scala and GBM with H2O

Here is a sample for binomial classification problem using H2O GBM algorithm using Credit Card data set in Scala language. The following sample is for multinomial classification problem. This sample is created using Spark 2.1.0 with Sparkling Water 2.1.4. import org.apache.spark.h2o._ import water.support.SparkContextSupport.addFiles import org.apache.spark.SparkFiles import java.io.File import water.support.{H2OFrameSupport, SparkContextSupport, ModelMetricsSupport} import water.Key import _root_.hex.glm.GLMModel import _root_.hex.ModelMetricsBinomial val hc […]

Continue reading