Building Regression and Classification GBM models in Scala with H2O

In the full code below you will learn to build H2O GBM model (Regression and binomial classification) in Scala. Lets first import all the classes we need for this project: import org.apache.spark.SparkFiles import org.apache.spark.h2o._ import org.apache.spark.examples.h2o._ import org.apache.spark.sql.{DataFrame, SQLContext} import water.Key import java.io.File import water.support.SparkContextSupport.addFiles import water.support.H2OFrameSupport._ // Create SQL support implicit val sqlContext = […]

Continue reading


NBA Winning Estimator with Decision Tree in Python

It would be interesting to conduct prediction to understand the trend of NBA winning teams. We will use data from http://www.basketball-reference.com/leagues/NBA_2017_games-june.html and follow workflow. More details can be found in Robert Layton’s book here: https://www.goodreads.com/book/show/26019855-learning-data-mining-with-python?from_search=true ###1. Load data from http://www.basketball-reference.com/leagues/NBA_2017_games-june.html import pandas as pd file=”NBA2017.csv” NBA2017=pd.read_csv(file,sep=”,”,parse_dates=[“Date”]) #change string of “Date” to date value NBA2017.columns=[“Date”, “Start […]

Continue reading


ROC and Confusion Matrix for Classifier in Python

We use the data from sklearn library(need to download face datasets separately), and the IDE is sublime text3. Most of the code comes from the book: https://www.goodreads.com/book/show/32439431-introduction-to-machine-learning-with-python?from_search=true ###################always keep the below code##################### import os import sys sys.path.append(‘//anaconda/lib/python3.6/site-packages’) ###################always keep the above code##################### import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from […]

Continue reading


How Certain is This Classifier? Uncertainty Estimates in Python

We are not only interested in which class a classifier predicts for a certain test point, but also how certain it is that this is the right class.There are two different functions revealing the certainty of the classifier. We use the data from sklearn library, and the IDE is sublime text3. Most of the code […]

Continue reading


Logistic Regression in Python to Tune Parameter C

The trade-off parameter of logistic regression that determines the strength of the regularization is called C, and higher values of C correspond to less regularization (where we can specify the regularization function).C is actually the Inverse of regularization strength(lambda) We use the data from sklearn library, and the IDE is sublime text3. Most of the […]

Continue reading


Quick KNN Examples in Python

Walked through two basic knn models in python to be more familiar with modeling and machine learning in python, using sublime text 3 as IDE. The first example of knn in python takes advantage of the iris data from sklearn lib. ###1. import data from sklearn.datasets import load_iris iris=load_iris() print(iris.keys()) print(‘\n”x:’,iris[‘feature_names’]) print(‘\n”y:’,iris[‘target_names’]) print(‘\n”type of data:’,type(iris[‘data’])) […]

Continue reading


Image Classification Using Convolutional Neural Networks in TensorFlow

This blog post introduces a type of neural network called a convolutional neural network (CNN) using Python and TensorFlow. A brief introduction to CNNs is given and a helper class for building CNNs in Python and TensorFlow is provided. The source code from this post is available here on GitHub. Motivation for CNNs Past blog […]

Continue reading


Summary of the Whale Detection Competition

Posting a summary on behalf of Cornell researchers. From my side I would like to add, that Marinexplore has partnered with Cornell University to develop acoustics related capabilities of our spatio-temporal data platform. Improved analytics of acoustic data is relevant not only to shipping industry, but also to other businesses like offshore industry. Globally there […]

Continue reading


Auto Encoder to Detect Anomalous Cases in Smartphone Actimetry Data

We use a deep auto-encoder model to analyze actimetry data from smartphones. You can find the data here:  http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones. Why should we do this? An auto encoder can be useful for excluding unknown or unusual activities, rather than incorrectly classifying them, by examining whether any of the activities tend to have more or less anomalous values. We […]

Continue reading