[Weekend Heck] Sending IPhone notification from IPython Notebook when the cell execution is done.

IPython Notebook has a useful notification feature that notifies when kernel is sitting idle i.e. all the cell execution is finished. Recently I was working on a long running python code. Most likely when the cell execution finishes I won’t be near my computer. Hence I thought of writing a magic function that can send […]

Continue reading


DRY HiveQL

DRY (don’t repeat yourself) is one of the fundamental principles of software engineering. The main idea is to avoid duplicating business/processing logic throughout the code. However, I rarely see it being applied when writing SQL queries; making it difficult to understand and maintain them. Below are few tips on making HiveQL DRY. Quick Summary Use […]

Continue reading


Intuition behind R2 and other regression evaluation metrics

There are many metrics for evaluating a regression model. But often they seem cryptic. Below is an attempt to help understand the intuition two often used such metrics: mean/median absolute error and R2 (or coefficient of determination) Average Accuracy of the Model (Mean/Median Absolute Error) Let’s assume you got a model that can predict house […]

Continue reading


Using R’s ggplot within IPython Notebook

As a Data Scientist I often use Python to write quick scripts to transform/massage data. But for data visualization I love using R’s gggplot. Although there is a version of ggplot written in python, I found it be lacking lot of features as compared to its R’s counterpart. Luckily using IPython Notebook you can have […]

Continue reading


If Independent Features Then { Multivariate Normal Distributions == Product of Univariate Normal Distributions }

Below is a simply mathematical proof to show that in a multivariate gaussian distribution if features are independent then probability density of a point can be computed as the product of probability density of individual features modeled as univariate gaussian distribution. where K = Number of Features = Mean vector of size K = Covariance […]

Continue reading