Here is an example of using H2O machine learning library and then building GLM, GBM and Distributed Random Forest models for categorical response variable. Lets import h2o library and initialize the H2O machine learning cluster: import h2o h2o.init() Importing dataset and getting familiar with it: df = h2o.import_file(“https://raw.githubusercontent.com/h2oai/sparkling-water/master/examples/smalldata/prostate.csv”) df.summary() df.col_names Lets configure our predictors and […]

# Using Cross-validation in Scala with H2O and getting each cross-validated model

Here is Scala code for binomial classification with GLM: https://aichamp.wordpress.com/2017/04/23/binomial-classification-example-in-scala-and-gbm-with-h2o/ To add cross validation you can do the following: def buildGLMModel(train: Frame, valid: Frame, response: String) (implicit h2oContext: H2OContext): GLMModel = { import _root_.hex.glm.GLMModel.GLMParameters.Family import _root_.hex.glm.GLM import _root_.hex.glm.GLMModel.GLMParameters val glmParams = new GLMParameters(Family.binomial) glmParams._train = train glmParams._valid = valid glmParams._nfolds = 3 ###### Here is […]

# How to regularize intercept in GLM

Sometime you may want to emulate hierarchical modeling to achieve your objective you can use beta_constraints as below: iris = h2o.import_file(“http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv”) bc = h2o.H2OFrame([(“Intercept”,-1000,1000,3,30)], column_names=[“names”,”lower_bounds”,”upper_bounds”,”beta_given”,”rho”]) glm = H2OGeneralizedLinearEstimator(family = “gaussian”, beta_constraints=bc, […]

# Building high order polynomials with GLM for higher accuracy

Sometimes when building GLM models, you would like to configure GLM to search for higher order polynomial of the features . The reason you may have to do is that, you may have strong predictors for a model and going for high order polynomial of predictors you will get higher accuracy. With H2O, you can […]

# Binomial classification example in Scala and GLM with H2O

Here is a sample for binomial classification problem using H2O GLM algorithm using Credit Card data set in Scala language. The following sample is for multinomial classification problem. This sample is created using Spark 2.1.0 with Sparkling Water 2.1.4. import org.apache.spark.h2o._ import water.support.SparkContextSupport.addFiles import org.apache.spark.SparkFiles import java.io.File import water.support.{H2OFrameSupport, SparkContextSupport, ModelMetricsSupport} import water.Key import _root_.hex.glm.GLMModel import _root_.hex.ModelMetricsBinomial val hc […]

# Cross-validation example with time-series data in R and H2O

What is Cross-validation: In k-fold cross–validation, the original sample is randomly partitioned into k equal sized subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. learn more at wiki.. When you have time-series data […]

# Building H2O GLM model using Postgresql database and JDBC driver

Note: Before we jump down, make sure you have postgresql is up and running and database is ready to respond your queries. Check you queries return results as records and are not null. Download JDBC Driver 42.0.0 JDBC 4: Download Page: https://jdbc.postgresql.org/download.html Driver Download: https://jdbc.postgresql.org/download/postgresql-42.0.0.jre6.jar Note: I have tested H2O 3.10.4.2 with above JDBC driver 4.0 […]

# Getting p-values from GLM model in python

Currently there is no way to get p-value from GLM fitted model in Python, it does work in R. >>> import numpy as np >>> df1 = h2o.H2OFrame.from_python(np.random.randn(100,4).tolist(), column_names=list(‘ABCD’)) Now try the following: >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> glmfitter3 = H2OGeneralizedLinearEstimator(family=”gaussian”, solver = “IRLSM”, alpha=0, lambda_=0,… compute_p_values=True ) >>> glmfitter3.train(x=[‘A’,’B’],y=”C”,training_frame=df1 )glm Model Build progress: […]