Apache Hadoop is a great framework for scalable computing
This blog lists out the steps to build Hadoop source from a checkout from the trunk. I’m working on Mac OS X, so most the steps would apply to *nix users as well, with minor differences.
- Check out the source
You can checkout the source using the following command:
svn checkout http://svn.apache.org/repos/asf/hadoop/common/trunk/ hadoop-trunk
- Use Maven 3
I was using Maven 2.2.1. If you start to build with a version earlier than 3, Hadoop with the use of the enforcer plugin stops the build (kudos to that) saying you need Maven 3.
A “mvn -version” will help you here to identify which maven version you are using.
- Install protobuf
Now, if you build everything will build except for the map-reduce module. For that, or specifically for YARN (inside map-reduce), you need to get protobuf.
It’s available here: http://code.google.com/p/protobuf/downloads/list
For this you need gcc. On a Mac OS X, the easiest way to do this is to just install Xcode (available on your Mac OS X install disk, under optional).
Now run a ‘configure’, then a ‘make’, followed by a ‘make install’, and you should have protobuf in your system.
- Complete the build
Now, you should be able to do a complete build of the whole hadoop source, including the map-reduce project.
Use the following command : mvn clean install -P-cbuild
Here we use the additional options to stop compiling the native code. The native code refers to, quoting Arun C. Murthy, ‘The native code, in this context, is the C executable used to launch the containers (tasks) by the NodeManager. The short summary of the executable is that it’s a setuid executable used to ensure that the unix process runs as the actual user who submitted the job, not as the unix user of the NodeManager.’
A thanks goes out to Ravi Theja, Praveen Sripathi and Arun Murthy for the help on the map-reduce mailing list. Hope the post helped you to build Hadoop, without running into any road blocks.