HOWTO: Run Apache Hive on Windows in 6 easy steps

Note: You need to have cygwin installed to run this tutorial, as Hadoop (needed by Hive) needs cygwin to run on windows. At a minimum, Basic, Net (OpenSSH,tcp_wrapper packages) and Security related Cygwin packages need to be present in the system.

Here are the 6 steps:

1. Download WSO2 BAM 2.0.0. It’s free and open source.

2. Extract it to a preferred location. Let’s call it $BAM_HOME.

3.Start the server by executing the wso2server.bat file present in $BAM_HOME/bin. The server would startup on the default port 9443 on the machine’s IP.

4. Log in to the web console at https://localhost:9443 using the default credentials, i.e. username: admin, password: admin and clicking “Sign-In”.

WSO2 BAM login screen
WSO2 BAM login screen

5. Navigate to the “Add Analytics” option by clicking the menu item on the left hand menu.

WSO2 BAM left hand menu - add analytics option
WSO2 BAM left hand menu – add Analytics option

6. Now execute your Hive script, by entering the script and clicking execute!

Note: Follow this KPI sample document to see a sample working for you in no time, with results appearing on a dashboard. Also, notice that you can schedule the Hive script as well.

Execute Apache Hive script
Execute Apache Hive script

I have to thank my colleague Buddhika Chamith, as all this was possible because of some grueling work done by him. Also, I hate the fact Hadoop and Hive makes it so hard to run stuff on Windows, especially since this is a Java application. Read about those concerns here.

YWHSJMGWZ8BA

Building Apache Hadoop from source

Apache Hadoop is a great framework for scalable computing
This blog lists out the steps to build Hadoop source from a checkout from the trunk. I’m working on Mac OS X, so most the steps would apply to *nix users as well, with minor differences.
  1. Check out the source
 You can checkout the source using the following command:
svn checkout http://svn.apache.org/repos/asf/hadoop/common/trunk/ hadoop-trunk
  1. Use Maven 3
I was using Maven 2.2.1. If you start to build with a version earlier than 3, Hadoop with the use of the enforcer plugin stops the build (kudos to that) saying you need Maven 3.
A “mvn -version” will help you here to identify which maven version you are using.
  1. Install protobuf
Now, if you build everything will build except for the map-reduce module. For that, or specifically for YARN (inside map-reduce), you need to get protobuf.
For this you need gcc. On a Mac OS X, the easiest way to do this is to just install Xcode (available on your Mac OS X install disk, under optional).
Now run a ‘configure’, then a ‘make’, followed by a ‘make install’, and you should have protobuf in your system.
  1. Complete the build
Now, you should be able to do a complete build of the whole hadoop source, including the map-reduce project.
Use the following command : mvn clean install -P-cbuild
Here we use the additional options to stop compiling the native code. The native code refers to, quoting Arun C. Murthy, ‘The native code, in this context, is the C executable used to launch the containers (tasks) by the NodeManager. The short summary of the executable is that it’s a setuid executable used to ensure that the unix process runs as the actual user who submitted the job, not as the unix user of the NodeManager.’

A thanks goes out to Ravi Theja, Praveen Sripathi and Arun Murthy for the help on the map-reduce mailing list. Hope the post helped you to build Hadoop, without running into any road blocks.