How the Sri Lankan Police can make use of Big Data Analysis?

I thought of writing this post after reading that SL police decided to offer 1 million rupees (~9500 USD) to anyone who offered a clue about the infamous robbery of the Colombo museum.

After reading this I thought what an awesome use it would be, if we could use big data analysis to solve crimes in this country. Let’s map out a simple scenario to see how this can be done.

Let’s say a burglary happened similar to the one at the museum. After the burglary was detected, the police is notified and they come in to begin investigations. After a while, it seems that the thieves involved are very good. No traces at all. Definitely, not an amateur job. But, there is one small scrap of evidence, a small finger print is uncovered. And that’s all we have to go on.

Let’s assume that the police has all criminal records in a data base including each criminal’s finger print. We run our finger print from the crime scene against the criminal data base. Now, for our bad luck the finger print does not match. At this point, it seems like all clues lead to a dead end (?).

Not necessarily. This is where the power of big data analysis comes in. The technology that allows to wire a set of inexpensive computers and analyze data sets in parallel.

So, with this enhanced data analysis ability, our finger print still can help us. Now we get to compare the finger print against, the entire nation’s finger prints. Now, can’t we do this without all this “big” data analysis? The answer is you can, but it would take you years to get to an answer. There is an amount of data that a single computer can process at a time, and a task like finger print matching is a task that requires a lot computing time. So, matching a finger print in a criminal database which would contain about 100,000 records of known and living criminals would be possible, at least after a day or two of computing time. But, comparing it against 20 million records would take many months or even years.

Let’s get back to our crime. We can now start out finger print matching against the general population’s database. We can start from the vicinity of where the crime has occurred and expand out. And all this can be computed in a matter of hours. Even if we had to match finger prints of the whole 20 million population of Sri Lanka, it shouldn’t take too much time considering we have a sizable network of computers. And after that time, we probably would have at least one definite match of the finger print that would aid us in the investigation.

Of course, this whole exercise assumes that all details of all citizens are computerized (I heard that this initiative is already under way). So, this kind of crime solving will be practically possible in the not too distant future (hopefully), creating crime fighting big data analysts in Sri Lanka.

Do you trust Google Big Query with your Big Data?

Google has come up with a fantastic service to analyze large amounts of data. It’s called BigQuery and it allows you to run analysis on big data on the cloud. As expected, the tool has a superb, intuitive web UI. The data analysis language uses SQL like queries. (Hive, anyone 😉 ). Have a look at the  Big Query Tutorial, it looks pretty neat. So, now all you need to do to run queries is to upload your data to Google using the form shown below. It allows you to upload a file or point to it using Google’s cloud storage.

Now, the interesting question here is that to analyze using BigQuery how much of that data are you willing to give Google? And how long will that take? The answer won’t be “Let me quickly upload a 500 GB file and run some queries”. That amount of data would definitely take some time to upload. So, effectively, this SaaS becomes pretty useless as more and more data volumes need to be uploaded for analysis.

Everyone trusts Google ( 🙂 ), so this concern might be easily ignored. But a potential other problem I see is the “Privacy Policies” that are violated. Usually, when you want to analyze data, it can contain sensitive data such as user behavior patterns and so forth. How comfortable will your customers be if you hand that data over to Google? Even anonymizing this data might not save you from a potential legal breach.

I still believe setting up your own data analysis and monitoring platform is the best way to go. Thoughts? I’d love to hear them.