Using Random Forests and Wordclouds to Visualize Feature Importance in Document Classification

What characteristics do the works of famous authors have that make them unique? This post uses ensemble methods and wordclouds to explore just that. Project Gutenberg offers a large number of freely available works from many famous authors. The dataset for this post consists of books, taken from Project Gutenberg, written by each of the […]

Continue reading


Humans in the Loop

This guest blog post from Paco Nathan dives into how people and machines collaborating together to perform work is real and not science fiction. Paco Nathan is the Director, Learning Group at O’Reilly Media and an advisor for Amplify Partners. His expertise includes machine learning, distributed systems, and cloud computing. He was cited in 2015 as […]

Continue reading


Softmax Regression using TensorFlow

Note: This article has also featured on geeksforgeeks.org . This article discusses the basics of Softmax Regression and its implementation in Python using TensorFlow library. What is Softmax Regression? Softmax regression (or multinomial logistic regression) is a generalization of logistic regression to the case where we want to handle multiple classes. A gentle introduction to […]

Continue reading


Introduction to TensorFlow

Note: This article has also featured on geeksforgeeks.org . This article is a brief introduction to TensorFlow library using Python programming language. Introduction TensorFlow is an open-source software library. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google’s Machine Intelligence research organization for the purposes of conducting machine […]

Continue reading


Analysis of Historical Weather Data for Los Angeles, CA

This post explores historical weather data from Los Angeles, California over the period of 1906 to the present using Pandas and Matplotlib. The data in the post was collected from the National Centers for Environmental Information website and is available for download here. Organizing the data by year, an animation of the max temperatures throughout […]

Continue reading


Measuring Data Science Business Value

This blog post covers metrics that help data science leaders ensure their team’s work is aligned to business value. Data science managers and executives, whether coming up through the technical side or the manager side, all struggle with providing visibility for their team and how the team’s work is aligned to business value. It is […]

Continue reading


Quick Tips for Getting A Data Science Team Off the Ground

Should you start a data science team? Or not? It isn’t an easy decision. This blog post provides tips to help leaders at startups and early-stage companies decide whether it is the right time to start building a data science team. Why Data Science? An increasing number of startups and early-stage companies are realizing they […]

Continue reading


Applying Data Science to Robotics

Author: Ammar A. Raja Source: http://www.datasciencecentral.com/profiles/blogs/how-data-science-apply-to-robotics 1. SHORT BIO OF THE AUTHOR Dr. Ammar A. Raja is an assistant professor at COMSATS Institute of Information Technology, Pakistan. He received his PhD degree in Finance from The London School of Economics and Political Science (LSE) in 2012. Apart from conducting research in data analytics, he also […]

Continue reading


Recommender Systems through Collaborative Filtering

This is a technical deep dive of the collaborative filtering algorithm and how to use it in practice. From Amazon recommending products you may be interested in based on your recent purchases to Netflix recommending shows and movies you may want to watch, recommender systems have become popular across many applications of data science. Like […]

Continue reading


Downloading more than 20 years of The New York Times

Articles for the period from 1987 to present are available without subscription. Their copyright notice is web scraping friendly: “… you may download material from The New York Times on the Web (one machine readable copy and one print copy per page) for your personal, noncommercial use only.” Why waste the opportunity to download these […]

Continue reading