Big
Data is a big term these days, everybody in the information management industry
is talking about it; in fact, along with cloud and mobile, it has become one of
the biggest items in 2011 for information professionals. But what makes Big
Data so big? Are the numbers getting larger or the hard drives getting heavier?
If you are thinking about getting in shape to be able to execute Big Data
projects, you are not far from the truth. I am not talking about improving your
muscular strength to lift heavier weights, but rethinking how we look at the
availability of data for our Business Intelligence implementations. Let me
explain, Big Data is a label applied to the collection of structured and
un-structured data available on a particular topic, situation or subject. The
implications of this are indeed big, think about a well-known company brand,
such as Pepsi or Coke and picture the amount of data these companies create on
a daily basis within their premises: thousands of sales, inventory, manufacturing
and logistics transactions take place each day. This “internal” data might well
amount to hundreds of megabytes, not a small amount by any means, but only a
fraction of what you could find in blogs, review pages, discussion forums, or
Facebook. In all likelihood the volume of data generated by their customers,
distributors and consumers will probably be several orders of magnitude larger
that the data generated within the company itself. Big Data is all about
tapping into this data through all possible channels & means, but more importantly
is about making sense out of it. Giving another example, if you are about to
buy a product in Amazon, you will probably be tempted to read the reviews
provided by previous buyers. While the primary purpose of reading these reviews
is to make a buy / non-buy decision, through the reading of these reviews you
will start getting a better understanding of the product, getting visibility
into its strengths and weakness. This new understanding can (and will surely)
shape your expectations on the item that you about to buy and help you to make
better use on some features while avoiding others altogether, if you decide to
buy it at all. There have been multiple technological efforts to harness Big
Data, one of the most prominent, from the open source community, is Hadoop.
Hadoop brings a map/reduce approach to the table through which one can
explore/analyze multiple streams of data and bring them together to present a
summarized result. Think about the US census where the objective is to get the
country demographics. With a population of over 3 million it would take a
single person a long time to visit everyone’s dwelling .The optimized approach
is to break the problem in multiple units that can attack the problem in
parallel so every city/county in the country will have a team to process the
results for the local area, bringing them together to understand the entire
population of the US. Hadoop has evolved into commercial distributions from
different vendors that promise to bring a more user friendly approach to the
installation and the execution of the technology. As valuable as these
commercial distributions are, the real value for an organization will come from
making sense out of this data, not in isolation but in combination with the
data already within the organization. Getting back to our discussion of Pepsi
and Coke, imagine the value that these two companies could drive (and
potentially already are) from Big Data, linking what is happening in the
organization with the data from the outside world to generate real time
insights. While this might sound like a straightforward value proposition there
are plenty of challenges on the way, and in order to successfully overcome
these challenges we will need a fit, trained, mind that keeps us going without
collapsing on the sheer weight of Big Data.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment