Building Apache Hadoop from source

Apache Hadoop is a great framework for scalable computing
This blog lists out the steps to build Hadoop source from a checkout from the trunk. I’m working on Mac OS X, so most the steps would apply to *nix users as well, with minor differences.
  1. Check out the source
 You can checkout the source using the following command:
svn checkout http://svn.apache.org/repos/asf/hadoop/common/trunk/ hadoop-trunk
  1. Use Maven 3
I was using Maven 2.2.1. If you start to build with a version earlier than 3, Hadoop with the use of the enforcer plugin stops the build (kudos to that) saying you need Maven 3.
A “mvn -version” will help you here to identify which maven version you are using.
  1. Install protobuf
Now, if you build everything will build except for the map-reduce module. For that, or specifically for YARN (inside map-reduce), you need to get protobuf.
For this you need gcc. On a Mac OS X, the easiest way to do this is to just install Xcode (available on your Mac OS X install disk, under optional).
Now run a ‘configure’, then a ‘make’, followed by a ‘make install’, and you should have protobuf in your system.
  1. Complete the build
Now, you should be able to do a complete build of the whole hadoop source, including the map-reduce project.
Use the following command : mvn clean install -P-cbuild
Here we use the additional options to stop compiling the native code. The native code refers to, quoting Arun C. Murthy, ‘The native code, in this context, is the C executable used to launch the containers (tasks) by the NodeManager. The short summary of the executable is that it’s a setuid executable used to ensure that the unix process runs as the actual user who submitted the job, not as the unix user of the NodeManager.’

A thanks goes out to Ravi Theja, Praveen Sripathi and Arun Murthy for the help on the map-reduce mailing list. Hope the post helped you to build Hadoop, without running into any road blocks.

8 thoughts on “Building Apache Hadoop from source

    • Though I could not figure out why, only the main trunk has the pom.xml file. I checked it out exactly the way you mentioned in your blog and it seems to be working.

      • Hi Srinivas,

        Glad you got it working. I can’t recall the structure exactly, but it seems the pom files are only there to build the trunk. The branches use ant. I don’t know why this is, though.

  1. Even I faced the same problem.The svn checkout for the specific version doesn’t have a pom.xml. !! The git -repository has one but it when I do a git download I always get the latest version not the specific one. It would be great if you could help me out with this.Thanks

  2. I found out that none of the 1.x releases have pom.xml. All have build.xml which can be built with ant .!! Any idea how to build with ant ? Thanks

  3. Hi Pavan,

    If you follow my instructions exactly it should work, just like it did for Srinivas. For ant, you just have to type “ant” (without the quotes) at the directory location of the build.xml file. Hope it works for you. Cheers.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s