RANSAC and Nonlinear Regression in Python

We use Python3. More details can be found in Sebastian Raschka’s book: https://www.goodreads.com/book/show/25545994-python-machine-learning?ac=1&from_search=true Find the data here: https://archive.ics.uci.edu/ml/datasets/Housing. Linear regression models can be heavily impacted by the presence of outliers. As an alternative to throwing out outliers, we will look at a robust method of regression using the RANdom SAmple Consensus (RANSAC) algorithm, which is […]

Continue reading


Preprocess: LDA and Kernel PCA in Python

Principal component analysis (PCA) is an unsupervised linear transformation technique that is widely used across different fields, most prominently for dimensionality reduction. We talked about it here: https://charleshsliao.wordpress.com/2017/05/28/preprocess-pca-application-in-python/ We use the data from sklearn library, and the IDE is Python3. Most of the code comes from Sebastian Raschka’s book: https://www.goodreads.com/book/show/25545994-python-machine-learning?ac=1&from_search=true ###1. import the data ###pls […]

Continue reading


Scaling Machine Learning to Modern Demands

This is a Data Science Popup session by Hristo Spassimirov Paskov, Founder & CEO of ThinkFast.   Summary Machine learning has revolutionized the technological landscape and its success has inspired the collection of vast amounts of data aimed at answering ever deeper questions and solving increasingly harder problems. Continuing this success critically relies on the […]

Continue reading


Setting up jupyter notebook server as service in Ubuntu 16.04

Step 1: Verify the jupyter notebook location: $ ll /home/avkash/.local/bin/jupyter-notebook -rwxrwxr-x 1 avkash avkash 222 Jun 4 10:00 /home/avkash/.local/bin/jupyter-notebook* Step 2: Configure your jupyter notebook with password and ip address as needed and make sure where it exist. We will use this file as configuration for jupyter as service. jupyter config: /home/avkash/.jupyter/jupyter_notebook_config.py Step 3: Create […]

Continue reading


Using Cross-validation in Scala with H2O and getting each cross-validated model

Here is Scala code for binomial classification with GLM: https://aichamp.wordpress.com/2017/04/23/binomial-classification-example-in-scala-and-gbm-with-h2o/ To add cross validation you can do the following: def buildGLMModel(train: Frame, valid: Frame, response: String) (implicit h2oContext: H2OContext): GLMModel = { import _root_.hex.glm.GLMModel.GLMParameters.Family import _root_.hex.glm.GLM import _root_.hex.glm.GLMModel.GLMParameters val glmParams = new GLMParameters(Family.binomial) glmParams._train = train glmParams._valid = valid glmParams._nfolds = 3 ###### Here is […]

Continue reading


CNN Model of Image Detection in Keras (TensorFlow) in Python3

This article covers the basic application of Keras and CNN in Python3, with Sublime text3 and Ipython Notebook as IDE. More details of the following code can be found in Robert Layton’s book here: https://www.goodreads.com/book/show/26019855-learning-data-mining-with-python?from_search=true ###The book above said that we will build a system that will take an image as an input ###and give […]

Continue reading


Keras in Python, Backend TensorFlow, with Iris data to Build Deep Learning Model

We talked about Deep Learning Modeling in TensorFlow in Python&R: https://charleshsliao.wordpress.com/2017/06/06/rnn-in-tensorflow-in-pythonr-with-mnist/ We also mentioned Keras application in R: https://charleshsliao.wordpress.com/2017/04/24/cnndnn-of-keras-in-r-backend-tensorflow-for-mnist/ This article covers the basic application of Keras and TensorFlow in Python3, with Sublime text3 and Ipython Notebook as IDE. More details of the following code can be found in Robert Layton’s book here: https://www.goodreads.com/book/show/26019855-learning-data-mining-with-python?from_search=true ###This […]

Continue reading


Movie Recommender -Affinity Analysis of Apriori in Python

“Affinity analysis can be applied to many processes that do not use transactions in this sense: Fraud detection Customer segmentation Software optimization Product recommendations. The classic algorithm for affinity analysis is called the Apriori algorithm. ” More details can be found in Robert Layton’s book here: https://www.goodreads.com/book/show/26019855-learning-data-mining-with-python?from_search=true We explored similar method of “Market Basket” here:https://charleshsliao.wordpress.com/2017/03/06/an-quick-association-rules-example-within-r/ […]

Continue reading


Generating ROC curve in SCALA from H2O binary classification models

You can use the following blog to built a binomial classification  GLM model: https://aichamp.wordpress.com/2017/04/23/binomial-classification-example-in-scala-and-gbm-with-h2o/ To collect model metrics  for training use the following: val trainMetrics = ModelMetricsSupport.modelMetrics[ModelMetricsBinomial](glmModel, train) Now you can access model AUC (_auc object) as below: Note: _auc object has array of thresholds, and then for each threshold it has fps and tps (use […]

Continue reading