Monday, November 14, 2011

Getting in shape to deal with Big Data


Big Data is a big term these days, everybody in the information management industry is talking about it; in fact, along with cloud and mobile, it has become one of the biggest items in 2011 for information professionals. But what makes Big Data so big? Are the numbers getting larger or the hard drives getting heavier? If you are thinking about getting in shape to be able to execute Big Data projects, you are not far from the truth. I am not talking about improving your muscular strength to lift heavier weights, but rethinking how we look at the availability of data for our Business Intelligence implementations. Let me explain, Big Data is a label applied to the collection of structured and un-structured data available on a particular topic, situation or subject. The implications of this are indeed big, think about a well-known company brand, such as Pepsi or Coke and picture the amount of data these companies create on a daily basis within their premises: thousands of sales, inventory, manufacturing and logistics transactions take place each day. This “internal” data might well amount to hundreds of megabytes, not a small amount by any means, but only a fraction of what you could find in blogs, review pages, discussion forums, or Facebook. In all likelihood the volume of data generated by their customers, distributors and consumers will probably be several orders of magnitude larger that the data generated within the company itself. Big Data is all about tapping into this data through all possible channels & means, but more importantly is about making sense out of it. Giving another example, if you are about to buy a product in Amazon, you will probably be tempted to read the reviews provided by previous buyers. While the primary purpose of reading these reviews is to make a buy / non-buy decision, through the reading of these reviews you will start getting a better understanding of the product, getting visibility into its strengths and weakness. This new understanding can (and will surely) shape your expectations on the item that you about to buy and help you to make better use on some features while avoiding others altogether, if you decide to buy it at all. There have been multiple technological efforts to harness Big Data, one of the most prominent, from the open source community, is Hadoop. Hadoop brings a map/reduce approach to the table through which one can explore/analyze multiple streams of data and bring them together to present a summarized result. Think about the US census where the objective is to get the country demographics. With a population of over 3 million it would take a single person a long time to visit everyone’s dwelling .The optimized approach is to break the problem in multiple units that can attack the problem in parallel so every city/county in the country will have a team to process the results for the local area, bringing them together to understand the entire population of the US. Hadoop has evolved into commercial distributions from different vendors that promise to bring a more user friendly approach to the installation and the execution of the technology. As valuable as these commercial distributions are, the real value for an organization will come from making sense out of this data, not in isolation but in combination with the data already within the organization. Getting back to our discussion of Pepsi and Coke, imagine the value that these two companies could drive (and potentially already are) from Big Data, linking what is happening in the organization with the data from the outside world to generate real time insights. While this might sound like a straightforward value proposition there are plenty of challenges on the way, and in order to successfully overcome these challenges we will need a fit, trained, mind that keeps us going without collapsing on the sheer weight of Big Data.