Why write unit tests?

It was not until after a few years of being a dev that I understood why you need good unit tests. Unit tests are usually a pain, or so I thought. Why do you need to test the code, that you have already verified as working??

The problem comes when it’s maintenance time. And all code goes through maintenance either by you or someone else. Unit tests are a superhero when it comes to making sure any change does not break functionality. I understood this the hard way, I hope you don’t have to.

Here are some more advantages, that I personally like about unit testing.

  • You don’t have to build other components to figure out basic functionality has broken.
  • The code naturally improves using proper interfaces to accomodate unit tests.
  • If basic functionality is broken you know immediately.
  • You do not need to write features, copy/paste jars or dlls to know whether your code works properly.

If your a Java dev, here’s a great 60 second tutorial to start you off in JUnit 4.

The problems in Hadoop – When does it fail to deliver?

Hadoop is a great piece of software. It is not original but that certainly does not take away its glory. It builds on parallel processing, a concept that’s been around for decades. Although conceptually unoriginal, Hadoop shows the power of being free and open (as in beer!) and most of all shows about what usability is all about. It succeeded where most other parallel processing frameworks failed. So, now you know that I’m not a hater. On the contrary, I think Hadoop is amazing. But, it does not justify some blatant failures on the part of Hadoop, may it be architectural, conceptual or even documentation wise. Hadoop’s popularity should not shield it from the need to re-enginer and re-work problems in the Hadoop implementation. The point below are based on months of exploring and hacking around Hadoop. Do dig in.

  1. Did I hear someone say “Data Locality”?
  2. Hadoop harps over and over again on data locality. In some workshops conducted by Hadoop milkers, they just went on and on about this. They say whenever possible, Hadoop will attempt to start a task on a block of data that is stored locally on that node via HDFS. This sounds like a super feature, doesn’t it? It saves so much of bandwidth without having to transfer TBs of data, right?

    Hellll, no. It does not. What this means is that first you have to figure out a way of getting data into HDFS, the Hadoop Distributed File System. This is non trivial, unless you live in the last decade and all your data exists as files. Assuming that you do, let’s transfer the TBs of data over to HDFS. Now, it will start doing it’s whole “data locality” thing.

    Ermm, OK. Am I hit by a wave of brilliance or isn’t it what’s is supposed to do anyway? Let’s get our facts straight. To use Hadoop, our problem should be able to execute in parallel. If the problem or a at least a sub-problem can’t be parallelized it won’t gain much out of Hadoop. This means the task algorithm is independent of any specific part of the data it processes. Further simplifying this would be saying, any task can process any section of the data. So, doesn’t that mean the “data locality” thing is the obvious thing to do? Why, would the Hadoop developers even write some code that would make a task process data in another node unless something goes horribly wrong. The feature would be if it was doing otherwise! If a task has finished operating on the node’s local data and then would transfer data from another node and process this data, that would be a worthy feature of the conundrum. At least that would be worthy of noise.

  3. Would you please put everything back into files
  4. Do you have nicely structured data in databases? Maybe, you became a bit fancy and used the latest and greatest NoSQL data store? Now let me write down what you are thinking. “OK, let’s get some Hadoop jobs to run on this, cause I want to find all this hidden gold mines in my data, that will get me a front page of Forbes.” I hear you. Let’s get some Hadoop jobs rolling. But wait! What the …..? Why are all the samples in text files. A plethora of examples using CSV files, tab delimited files, space delimited files, and all other kind of neat files. Why is everyone going back a few decades and using files again? Haven’t all these guys heard of DBs and all that fancy stuff. It seems that you were too early an adopter of Data Stores.

    Files are the heroes of the Hadoop world. If you want to use Hadoop quickly and easily, the best path for you right is to export your data neatly into files and run all those snazzy word count samples (Pun intended!). Because without files Hadoop can’t do all that cool “data locality” shit. Everything has to be in HDFS first.

    So, what would you do to analyze your data in the hypothetical FUHadoopDB? First of all, implement about 10+ classes necessary to split and transfer data into the Hadoop nodes and run your tasks. Hadoop needs to know how to get data from FUHadoopDB, so let’s assume this is acceptable. Now, if you don’t store it in HDFS, you won’t get the data locality shit. If this is the case, when the task runs, they themselves will have to pull data from the FUHadoopDB to process the data. But, if you want the snazzy data locality shit you need to pull data from FUHadoopDB and store it in HDFS. You will not incur the penalty of pulling data while the tasks are running, but you pay it at the preparation stage of the job, in the form of transferring the data into HDFS. Oh and did I mention the additional disk space you would need to store the same data in HDFS. I wanted to save that disk space, so I chose to make my tasks pull data while running the tasks. The choice is yours.

  5. Java is OS independent, isn’t it?
  6. Java has its flaws but for the most part it runs smoothly on most OSs. Even if there are some OS issues, it can be ironed out easily. The Hadoop folks have issued document mostly based on Linux environments. They say Windows is supported, but ignored those ignorant people by not providing adequate documentation. Windows didn’t even make it to the recommended production environments. It can be used as a development platform, but then you will have to deploy it on Linux.

    I’m certainly not a windows fan. But if I write a Java program, I’d bother to make it run on Windows. If not, why the hell are you using Java? Why the trouble of coming up with freaking bytecode? Oh, the sleepless nights of all those good people who came up with byte code and JVMs and what not have gone to waste.

  7. CS 201: Object Oriented Programming
  8. If you are trying to integrate Hadoop into your platform, think again. Let me take the liberty of typing your thoughts. “Let’s just extend a few interfaces and plugin my authentication mechanism. It should be easy enough. I mean these guys designed the world’s greatest software that will end world hunger.”. I hear you again. If you are planning to do this, don’t. It’s like OOP anti patterns 101 in there. So many places that would say “if (kerberos)” and execute some security specific function. One of my colleagues went through this pain, and finally decided to that it’s easier to write keberos based authentication for his software and then make it work with Hadoop. With great power comes great responsibility. Hadoop fails to fulfil this responsibility.

Even with these issues, Hadoop’s popularity seems to be catching significant attention, and its rightfully deserved. Its ability to commodotize big data analytics should be exalted. But it’s my opinion that it got way too popular way too fast. The Hadoop community needs to have another go at revamping this great piece of software.

Why you should not buy the Kindle Fire if you live outside US?

The Kindle Fire is said to ‘disrupt’ the tablet market. I certainly agree after looking at the very attractive price tag of $199. It seems like a super deal for someone who wants a tablet experience but hates to spend too much.

But this is all so attractive only if you live in the US. Yes, it’s only sold in the US, you may say, but I wanted to get a Fire and use it my home country (i.e. Sri Lanka). I realized all would be well, until I came across some amazon forum posts.

Here’s the what you can and cannot do when you use the Fire outside US:

  1. Using the Fire outside US? Yes, after purchasing the Fire you can use it outside the US. No legal restrictions there.
  2. Purchasing Apps outside US? No, you cannot download apps outside US. Of course, you can use apps that you have already downloaded while in the US. (This was the killer for me)
  3. Purchasing Books outside US? Yes, you can purchase and use books outside US.
  4. Using movies or TV features? No, you cannot download movies or use TV features provided by Amazon.
  5. Using web browser or email? Yes, you can use web and email as long as you are connected to a wifi access point.

So, I’ve changed my decision on buying a kindle fire and I’m going to pay some more and buy an Android tablet or an iPad.

Revolution over Evolution

I always believe it is better to revolt and change ever so fast than to evolve slowly and painfully. Something I see my idols preach and practice. Be passionate and smart.

Everything is very simple on paper, but how many actually change. Themselves or those around. If you are not happy doing what you do, don’t do it. Find what you love to do. I’m glad to be exactly where I want to be.

Do what you’re passionate about. If you do this, there will be few people competing or running faster than you – a lesson from Warren Buffet