Setting up quick API with F# and Azure Functions

As mentioned in my last week Elixir blog post, I produced some quick fake API based on Azure Functions. I thought it’s gonna take a couple of minutes, but it turned out to be a whole adventure in itself. The creating of a function is a breeze. Go to Portal, click big green “+” sign and […]

Continue reading


Constructing blocks and file system relationship in HDFS

I am using a 3 nodes Hadoop cluster running Windows Azure HDInsight for the testing. In Hadoop we can use fsck utility to diagnose the health of the HDFS file system, to find missing files or blocks or calculate them for integrity. Lets Running FSCK for the root file system: c:appsdisthadoop-1.1.0-SNAPSHOT>hadoop fsck / FSCK started […]

Continue reading


A Menu based Windows Azure PowerShell script for PaaS and IaaS Operations

When running the Powershell Menu look like as below:             Get script from here: https://github.com/Avkash/AzurePowershellmenu/blob/master/PowershellMenuPub.ps1 For those who would like to fork and then add more functionality use the command as below:   $ ls -l total 5 -rw-r–r–    1 avkashc  Administ     8350 Feb 12 12:02 PowershellMenuPub.ps1 -rw-r–r–    1 avkashc  Administ       […]

Continue reading


Resource Allocation Model in MapReduce 2.0

What was available in previous MapReduce: Each node in the cluster was statically assigned the capability of running a predefined number of Map slots and a predefined number of Reduce slots. The slots could not be shared between Maps and Reduces. This static allocation of slots wasn’t optimal since slot requirements vary during the MR […]

Continue reading


Processing already sorted data with Hadoop Map/Reduce jobs without performance overhead

While working with Map/Reduce jobs in Hadoop, it is very much possible that you have got “sorted data” stored in HDFS. As you may know the “Sort function” exists not only after map process in map task but also with merge process during reduce task, so having sorted data to sort again would be a […]

Continue reading


How to submit Hadoop Map/Reduce jobs in multiple command shell to run in parallel

Sometimes it is required to run multiple Map/Reduce jobs in same Hadoop cluster however opening several Hadoop command shell or (Hadoop terminal) could be trouble. Note that depend on your Hadoop cluster size and configuration, you can run limited amount of Map/Reduce jobs in parallel however if you would need to do so, here is […]

Continue reading


Listing current running Hadoop Jobs and Killing running Jobs

When you have jobs running in Hadoop, you can use the map/reduce web view to list the current running jobs however what if you would need to kill any current running job because the submitted jobs started malfunctioning or in worst case scenario, the job is stuck in infinite loops. I have seen several scenarios […]

Continue reading


How to troubleshoot MapReduce jobs in Hadoop

When writing MapReduce programs you definitely going to hit problems in your programs such as infinite loops, crash in MapReduce, Incomplete jobs etc. Here are a few things which will help you to isolate these problems: Map/Reduce Logs Files: All MapReduce jobs activities are logged by default in Hadoop. By default, log files are stored […]

Continue reading