Analytics for Human Information: The New Top Ten Myths of Big Data - Myth #4

In this installment of our “New Myths” (check out Big Data Myth #3, if you missed it) I’m going to break ranks and point out that the emperor’s new clothes aren’t providing much cover.  In direct conversations with literally hundreds of colleagues, customers, analysts and pundits over the last six months, I’ve noticed that this myth is amongst the strongest out there, and it is one that needs to be put into proper perspective before the trough of disillusionment becomes too deep for all of us to climb out.

 

Big Data Myth #4: If You Are Doing Hadoop, You’re Doing Big Data

 

Every technical revolution has its starting point, its catalyst, and certainly forBig Data that has been Hadoop.  This brilliant technical platform was created just a few short years ago, in order to address a very specific, very focused “Big Data” issue; how to distribute (then) terabytes of data, across a network of computers in such a way that this large amount of information could be processed and the answer to a particular problem determined.

 

At its heart, Hadoop is a MapReduce engine, and this name is properly descriptive.  Hadoop takes a huge chunk of data and distributes it across dozens, hundreds or thousands of computers, and maintains a map of where all of the data resides. Then, each of these computers, or nodes, is given an analysis to perform, consistent across every node.  This analysis is a “Reduce” operation, where the data is processed so that the computer can reduce or collapse the data into a simple output.  Each node performs the reduction of its own data, creates its own output, and returns it to the central controller, which combines all of the results to come up with a final solution.

 

Hadoop is powerful, scalable, and generally extremely fast, as long as the question that you are looking to have answered is fairly simple, is linear in nature (rather than iterative), and can be processed as a batch.  If you hold yourself within these limits, you’ll get a great deal of value from Hadoop.

 

These constraints, however, are the source of potentially-significant limitations to Hadoop’s usefulness for many business needs.  Say that you are looking to ask fairly sophisticated, complex questions; something beyond “yes” or “no”, what’s the biggest, smallest or average etc.  “Reduce” works great, as long as what you are asking is itself a simple reduction of the data.  More complex questions require much more sophisticated coding, and a lot more computational time.

 

Perhaps you need to consume large amounts of real time data and create real-time results  Hadoop would respond “Sorry, I’m a batching technology,” and at some point, you can batch only so frequently and only so fast.  So, real time is a real issue. You might notice this with websites such as LinkedIn or Klout, where updates come every day or so.  If you want to know who looked up your profile a second ago (and might still be on your page) you’ll have to wait for the next batch to finish, oh, sometime tonight!

 

Finally, let’s say that you need to do a iterative analysis; taking multiple cuts across a data set in order to ferret out some more subtle relationships.  Sorry again, but MapReduce does what it does one “reduction” at a time, and so iterative or recursive analyses are out, at least for the time-being. 

 

None of this is to slam Hadoop, rather it is an acknowledgement of what it was built to do, what it is good for, and what it is not designed to do… yet.  This last point is important, because there is an army of developers out there expanding upon Hadoop’s capabilities, and they are working hard to deal with some of these inherent shortcomings. This new functionality is on the way, but in the interim, businesses need to look beyond Hadoop in addressing certain “Big Data” projects.

 

And so they are.  Many of our customers are using the technologies in HP’s HAVEn platform to extend Hadoop’s capabilities and in so doing they are able to ask, and answer, questions that they otherwise could not. They recognized early on that Hadoop, like any technology, has both inherent benefits and limitations, and as a result they needed to embrace additional technologies that compliment Hadoop.  In this, they are realizing significant business capabilities and outcomes that resonate with their customers.

 

In closing, if the depth and breadth of your “Big Data” efforts start and end with Hadoop, you very likely are not doing “Big Data”, or at least you’re doing it with one hand tied behind your back.  To improve your results, look to Hadoop’s inherent strengths and weaknesses, map those to the business problems that you’re trying to address, and build out a technology platform where there are functional gaps in what Hadoop is providing today. I’ve written more about Hadoop in a previous post.

 

Gotta run.  I have to batch-process a bunch of tweets and Facebook posts out to my friends and colleagues…Not!

 

In Myth #5 we will explore why the members of Motley Crue have signed up for ComSci courses online! (Just kidding, I think). 

 

Click below to continue reading about The New Top Ten Myths of Big Data :

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the Community Guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
Chris Surdak is a Subject Matter Expert on Information Governance, analytics and eDiscovery for HP Autonomy. He has over 20 years of consul...
Featured


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.