A couple of years back, I wrote
my first blog on Big data: –Getting in Shape with Big Data. While almost two years have gone by –
almost an eternity for the Information age standards – I keep hearing the same question
from my customer visits: are we there yet? Reality is that Big Data is just starting to
deliver – Yes the value is indeed real and tangible - but the technology is on
the early stages and it takes a lot of vision, expertise and sheer hard work to
make a Big Data solution work. Hadoop has become the least
expensive file system in the world. Many organizations are starting to use it as
a cheaper Data Archival alternative with the promise that if the data needs to
be retrieved the cost of storage and access will be cheaper than tape itself. While
data archiving might sound like a modest use case it has created quite a
revolution in the BI ecosystem. If you think about it, many large organizations
kept a lot more data in the DWHSE because of the perceived loss of the data
once it was moved to tape. However, with the data now being kept on Hard Drives
and available with a simple Java map-reduce programs, many organizations are literally
cleaning house and moving old data out of the Data Warehouse thus delaying the
upgrade of the infrastructure and effectively putting some projects on hold.
From an analytics perspective
adoption has been slower, while open source brings some good tools into the
ecosystem – including R (for advanced analytics) and Graphite (for advanced
graphics and plotting), we have yet to identify how to leverage of the power of
the data stored in Hadoop for the masses. While it is true that everybody with
Data and Java skills can write Map-Reduce programs, let us face it the majority
of the analysts are Excel and Access gurus with the advanced population being
proficient on SQL, only a very small percentage actually know how to code in Java.
This lack of Java knowledge among
the analysts has truly become the bottle neck for Hadoop to be an effective replacement
of the Data Warehouse. However, the innovation continues and some open source
projects like HIVE & PIGS continue to evolve to enhance the data
consumption experience. While still in the early stages, the technology is very
promising and will eventually mature to the point that Java Map-Reduce will be
an exception rather than the norm for accessing data stored in Hadoop. In fact,
there are technologies today, albeit commercial such as Teradata Aster, which
can insulate the analysts from writing Map-Reduce and will simplify access to
data stored in HDFS (Hadoop File System).
So are we there yet? Maybe not,
but the speed of change is accelerating. As of now, probably most of the
Fortune 500 companies are defining a big data strategy which in turn will push software,
hardware and services vendors to innovate at a faster rate to close the gap
against the expectations (or hype) that Big Data has created with the business
stakeholders. My prediction is that in two more years (mid 2015) we will be
able to access Hadoop in different ways that put the technology much closer (from
a data perspective) to the relational SQL databases that we know today.