Deep Learning OCR using TensorFlow and Python

In this post, deep learning neural networks are applied to the problem of optical character recognition (OCR) using Python and TensorFlow. This post makes use of TensorFlow and the convolutional neural network class available in the TFANN module. The full source code from this post is available here. Introduction to OCR OCR is the transformation […]

Continue reading


Using H2O AutoML for Kaggle Porto Seguro Safe Driver Prediction Competition

If you into competitive machine learning you must be visiting Kaggle routinely. Currently you can compete for cash and recognition at the Porto Seguro’s Safe Driver Prediction as well. I did try to given training dataset (as it is) with H2O AutoML which ran for about 5 hours and I was able to get into top […]

Continue reading


Flatten complex nested parquet files on Hadoop with Herringbone

Herringbone Herringbone is a suite of tools for working with parquet files on hdfs, and with impala and hive.https://github.com/stripe/herringbone Please visit my github and this specific page for more details. Installation: Note: You must be using a Hadoop machine and herringbone needs Hadoop environmet. Pre-requsite : Thrift Thrift 0.9.1 (MUST have 0.9.1 as 0.9.3 and […]

Continue reading


Handling exception “Argument python_obj should be a …”

Recently I hit the following exception when running python code with H2O functions on a new machine however this exception does not happen on my main machine. The exception was as below: H2OTypeError: Argument `python_obj` should be a None | list | tuple | dict | numpy.ndarray | pandas.DataFrame | scipy.sparse.issparse, got H2OTwoDimTable Error in […]

Continue reading


Exploring & transforming H2O Data Frame in R and Python

Sometime you may need to ingest a dataset for building models and then your first task is to explore all the features and their type you have. Once that is done you may want to change the feature types to the one you want. Here is the code snippet in Python: df = h2o.import_file(‘https://raw.githubusercontent.com/h2oai/sparkling-water/master/examples/smalldata/prostate.csv’) df.types […]

Continue reading


Full working example of connecting Netezza from Java and python

Before start connecting you must make sure you can access the Netezza database and table from the machine where you are trying to run Java and or Python samples. Connecting Netezza server from Python Sample Check out my Ipython Jupyter Notebook with Python Sample Step 1: Importing python jaydebeapi library import jaydebeapi Step 2: Setting Database […]

Continue reading


Python example of building GLM, GBM and Random Forest Binomial Model with H2O

Here is an example of using H2O machine learning library and then building GLM, GBM and Distributed Random Forest models for categorical response variable. Lets import h2o library and initialize the H2O machine learning cluster: import h2o h2o.init() Importing dataset and getting familiar with it: df = h2o.import_file(“https://raw.githubusercontent.com/h2oai/sparkling-water/master/examples/smalldata/prostate.csv”) df.summary() df.col_names Lets configure our predictors and […]

Continue reading


RIP Theano

Before TensorFlow, PyTorch and Caffe; Theano was the major library for deep learning development. However, the library’s development and support will end after the upcoming Theano 1.0 release. The news came in an email from Theano’s main developer Pascal Lamblin and Yoshua Bengio, notable expert on artificial neural networks and deep learning. “We will continue […]

Continue reading


Visualizing H2O GBM and Random Forest MOJO Models Trees in python

In this example we will build a tree based model first using H2O machine learning library and the save that model as MOJO. Using GraphViz/Dot library we will extract individual trees/cross validated model trees from the MOJO and visualize them. If you are new to H2O MOJO model, learn here. You can also get full […]

Continue reading


H2O AutoML examples in python and Scala

AutoML is included into H2O version 3.14.0.1 and above. You can learn more about AutoML in the H2O blog here. H2O’s AutoML can be used for automating a large part of the machine learning workflow, which includes automatic training and tuning of many models within a user-specified time-limit. The user can also use a performance […]

Continue reading