Flatten complex nested parquet files on Hadoop with Herringbone

Herringbone Herringbone is a suite of tools for working with parquet files on hdfs, and with impala and hive.https://github.com/stripe/herringbone Please visit my github and this specific page for more details. Installation: Note: You must be using a Hadoop machine and herringbone needs Hadoop environmet. Pre-requsite : Thrift Thrift 0.9.1 (MUST have 0.9.1 as 0.9.3 and […]

Continue reading


Enterprise Hadoop Solution distributed by key Hadoop vendors

Lets start from Cloudera Enterprise Data Hub: Here is the offering from Hortonworks: And this is how MapR is packaging Enterprise Hadoop And finally Pivotal Enterprise Hadoop offering: Keywords: Apache Hadoop, Cloudera, Hortonworks, Pivotal, MapR, Big Data Advertisements

Continue reading


Setting up Pivotal Hadoop (PivotalHD 1.1 Community Edition) Cluster in CentOS 6.5

Download Pivotal HD Package http://bitcast-a.v1.o1.sjc1.bitgravity.com/greenplum/pivotal-sw/pivotalhd_community_1.1.tar.gz The package consist of 3 tarball package: PHD-1.1.0.0-76.tar.gz PCC-2.1.0-460.x86_64.tar.gz PHDTools-1.1.0.0-97.tar.gz Untar above package and start with PCC (Pivotal Command Center) Install Pivotal Command Center: $tar -zxvf PCC-2.1.0-460.x86_64.tar.gz $PHDCE1.1/PCC-2.1.0-460/install Log in using  newly created user gpadmin: $  su – gpadmin $  sudo cp /root/.bashrc . $  sudo cp /root/.bash_profile . $  sudo […]

Continue reading