Handling YARN resources manager issue with decommissioned nodes

If you hit the following exception with your YARN resource manager: ERROR/Exception: 17/07/31 15:06:13 WARN retry.RetryInvocationHandler: Exception while invoking class org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes over rm1. Not retrying because try once and fail.java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl Troubleshooting: Please try running the following command and you will see the exact same exception: $ yarn node -list […]

Continue reading


Saving H2O models from R/Python API in Hadoop Environment

When you are using H2O in clustered environment i.e. Hadoop the machine could be different where h2o.savemodel() is trying to write the model and thats why you see the error “No such file or directory”. If you just give the path i.e. /tmp and visit the machine ID where H2O connection is initiated from R, you […]

Continue reading


DRY HiveQL

DRY (don’t repeat yourself) is one of the fundamental principles of software engineering. The main idea is to avoid duplicating business/processing logic throughout the code. However, I rarely see it being applied when writing SQL queries; making it difficult to understand and maintain them. Below are few tips on making HiveQL DRY. Quick Summary Use […]

Continue reading


Optimizing Jaccard Similarity Computation for Big Data

Computing Jaccard similarity across all entries is a hercules task. Its in the order of . However there are many proposed optimization techniques that can significantly reduce the number of pairs that one needs to consider. I spent almost a week studying the various techniques and googling useful resources related to this topic. Below is […]

Continue reading


Writing Hive Custom Aggregate Functions (UDAF): Part II

Now that we got eclipse configured (see Part I) for UDAF development, its time to write our first UDAF. Searching for custom UDAF, most people might have already came across the following page, GenericUDAFCaseStudy.This page has a good explanation of how to write a good generic UDAF but as a newbie it can be daunting, […]

Continue reading