[Weekend Heck] Sending IPhone notification from IPython Notebook when the cell execution is done.

IPython Notebook has a useful notification feature that notifies when kernel is sitting idle i.e. all the cell execution is finished. Recently I was working on a long running python code. Most likely when the cell execution finishes I won’t be near my computer. Hence I thought of writing a magic function that can send […]

Continue reading


Notes: Functional Programming Principles In Scala

Week 1: Functions & Evaluations What are the different paradigms of programming ? Imperative Programming: Based on Von Newman’s idea, this programming style closely maps process operations to operations on memory. For instance variable dereference is same as load instruction, variable referencing is same store operation, control structure is same as jumping across memory cells. […]

Continue reading


DIVIDING DATA INTO TRAINING AND TESTING IN R

During machine learning one often needs to divide the two different data sets, namely training and testing datasets. While you can’t directly use the “sample” command in R, there is a simple workaround for this. Essentially, use the “sample” command to randomly select certain index number and then use the selected index numbers to divide […]

Continue reading


Optimizing Jaccard Similarity Computation for Big Data

Computing Jaccard similarity across all entries is a hercules task. Its in the order of . However there are many proposed optimization techniques that can significantly reduce the number of pairs that one needs to consider. I spent almost a week studying the various techniques and googling useful resources related to this topic. Below is […]

Continue reading


Writing Hive Custom Aggregate Functions (UDAF): Part I – Setting Eclipse

Writing your first user defined aggregation functions (UDAF) for hive can be a daunting task. In particular I found these three challenges while working on my first UDAF: No instructions on how to setup eclipse for UDAF development Often complicated instructions on how to write your first UDAF. No clear instructions on how to debug […]

Continue reading


GENERATING BAYESIAN GRAPH USING RUBY AND GRAPHVIZ

Lot of folks, especially those taking probabilistic graphical model class, might have realized the need for a simple tool/software that can quickly generate a directed acyclic graph (DAG) and the associated conditional probability distribution (CPD) tables for each node of the graph. Below is a quick hack that I pulled together using Ruby and AT&T’s GraphViz library to […]

Continue reading