Monday, June 10, 2013

Big Data, are we there yet?

A couple of years back, I wrote my first blog on Big data: –Getting in Shape with Big Data. While almost two years have gone by – almost an eternity for the Information age standards – I keep hearing the same question from my customer visits: are we there yet?  Reality is that Big Data is just starting to deliver – Yes the value is indeed real and tangible - but the technology is on the early stages and it takes a lot of vision, expertise and sheer hard work to make a Big Data solution work. Hadoop has become the least expensive file system in the world. Many organizations are starting to use it as a cheaper Data Archival alternative with the promise that if the data needs to be retrieved the cost of storage and access will be cheaper than tape itself. While data archiving might sound like a modest use case it has created quite a revolution in the BI ecosystem. If you think about it, many large organizations kept a lot more data in the DWHSE because of the perceived loss of the data once it was moved to tape. However, with the data now being kept on Hard Drives and available with a simple Java map-reduce programs, many organizations are literally cleaning house and moving old data out of the Data Warehouse thus delaying the upgrade of the infrastructure and effectively putting some projects on hold.
From an analytics perspective adoption has been slower, while open source brings some good tools into the ecosystem – including R (for advanced analytics) and Graphite (for advanced graphics and plotting), we have yet to identify how to leverage of the power of the data stored in Hadoop for the masses. While it is true that everybody with Data and Java skills can write Map-Reduce programs, let us face it the majority of the analysts are Excel and Access gurus with the advanced population being proficient on SQL, only a very small percentage actually know how to code in Java. This lack of Java knowledge among the analysts has truly become the bottle neck for Hadoop to be an effective replacement of the Data Warehouse. However, the innovation continues and some open source projects like HIVE & PIGS continue to evolve to enhance the data consumption experience. While still in the early stages, the technology is very promising and will eventually mature to the point that Java Map-Reduce will be an exception rather than the norm for accessing data stored in Hadoop. In fact, there are technologies today, albeit commercial such as Teradata Aster, which can insulate the analysts from writing Map-Reduce and will simplify access to data stored in HDFS (Hadoop File System).
So are we there yet? Maybe not, but the speed of change is accelerating. As of now, probably most of the Fortune 500 companies are defining a big data strategy which in turn will push software, hardware and services vendors to innovate at a faster rate to close the gap against the expectations (or hype) that Big Data has created with the business stakeholders. My prediction is that in two more years (mid 2015) we will be able to access Hadoop in different ways that put the technology much closer (from a data perspective) to the relational SQL databases that we know today.