The state of machine learning in Apache Spark

[A version of this post appears on the O’Reilly Radar.] The O’Reilly Data Show Podcast: Ion Stoica and Matei Zaharia explore the rich ecosystem of analytic tools around Apache Spark. In this episode of the Data Show, we look back to a recent conversation I had at the Spark Summit in San Francisco with Ion […]

Continue reading


A Neural Network in 10 lines of CUDA C++ Code

Purpose: For education purposes only. The code demonstrates supervised learning task using a very simple neural network. Reference: inspired by Andrew Trask‘s post. The core component of the code, the learning algorithm, is only 10 lines: The loop above runs for 50 iterations (epochs) and fits the vector of attributes X to the vector of […]

Continue reading


Domino now supports JupyterLab — and so much more

You can now run JupyterLab in Domino, using a new Domino feature that lets data scientists specify any web-based tools they want to run on top of the Domino platform. Introduction Domino is a data science platform that supports the entire data science lifecycle, from exploratory analysis through experimentation and all the way to deployment. […]

Continue reading


PoE AI Part 5: Real-Time Obstacle and Enemy Detection using CNNs in TensorFlow

This post is the fifth part of a series on creating an AI for the game Path of Exile © (PoE). A Deep Learning Based AI for Path of Exile: A Series Calibrating a Projection Matrix for Path of Exile PoE AI Part 3: Movement and Navigation PoE AI Part 4: Real-Time Screen Capture and […]

Continue reading


Using Random Forests and Wordclouds to Visualize Feature Importance in Document Classification

What characteristics do the works of famous authors have that make them unique? This post uses ensemble methods and wordclouds to explore just that. Project Gutenberg offers a large number of freely available works from many famous authors. The dataset for this post consists of books, taken from Project Gutenberg, written by each of the […]

Continue reading


Humans in the Loop

This guest blog post from Paco Nathan dives into how people and machines collaborating together to perform work is real and not science fiction. Paco Nathan is the Director, Learning Group at O’Reilly Media and an advisor for Amplify Partners. His expertise includes machine learning, distributed systems, and cloud computing. He was cited in 2015 as […]

Continue reading


Softmax Regression using TensorFlow

Note: This article has also featured on geeksforgeeks.org . This article discusses the basics of Softmax Regression and its implementation in Python using TensorFlow library. What is Softmax Regression? Softmax regression (or multinomial logistic regression) is a generalization of logistic regression to the case where we want to handle multiple classes. A gentle introduction to […]

Continue reading


Introduction to TensorFlow

Note: This article has also featured on geeksforgeeks.org . This article is a brief introduction to TensorFlow library using Python programming language. Introduction TensorFlow is an open-source software library. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google’s Machine Intelligence research organization for the purposes of conducting machine […]

Continue reading