What is HP's Big Data platform?

What is HP's Big Data platform? Let’s start by recapping the data it can collect. Then we'll look at the analyses that it can perform on that data. And finally, I'll explain where the name "HAVEn" comes from.

 

What data do we collect?

In a previous blog post, I talked about how we can collect

  • Normal System of Record transactions. But we can now do so over a longer period of time because our Big Data structured data analysis engine doesn’t slow down when presented with lots of data
  • The micro-transactions that lie inbetween those Systems of Record
  • Machine to machine data
  • Human interaction data. Voice, video, pictures, texts, emails, and documents

360 Degree Data Collection

We in HP talk about how we have a "360 degree analysis platform"  Before you can do "360 degree analysis", you need 360 degree data collection. Key to this collection, of course, are data collectors. Across the HAVEn platform, we have well over 700 different data collectors.

 

Where do we put all this Big Data?

Where do we put all this Big Data? Interesting question. Traditionally, we think about each analysis engine having its own storage system. However, we are seeing a blurring of this model. For example:

  • HPLabs have created a Big Data solution prototype where they are storing tweets in HP Vertica, but using the HP Autonomy engine to get meaning out of the tweets' text
  • People are staging data in a sea of Hadoop servers, then bringing parts of it into HP Vertica for high speed structured analysis
  • The new HP Operations Analytics product uses ArcSight to collect log data but then stores and analyses it using Vertica

What analyses can we do on the data?

Typical structured record analysis - but faster

We can do high speed analysis on structured data using the usual tools like SQL and R.

 

I've noticed that many of our HP Vertica customers are doing data correlation:

  • HP's Gen8 servers send machine-to-machine messages to HP that tell our support organization that there is a problem with a server. Using the Vertica platform, HP correlates the problem with the environment on the server - "this combination of hardware and infrastructure software causes problems"
  • The US retailer Guess correlates sales of one product with sales of other products, looking for product pairings that they should promote together
  • A Latin American credit card provider correlates cards that are reported as skimmed. Smimming is where someone takes your card for a payment and actually takes a copy of the card. These cards are then correlated against vendors who took those cards to look for vendors who are card skimming

Other customers are looking for patterns in the structured data:

  • Fraud and cheating - e.g. GSN (www.gsn.com) can recognize the click-stream pattern of online gaming cheats
  • Churn - a number of onlining gaming companies are able look at click-streams to recognize when customers are bored with their games and are about to churn

Human meaning in text, voice, videos and pictures

Humans are amazing. We are able to take in huge amounts of data and derive meaning (or, probable meaning) from it. Our computer-based analysis technologies are just starting to be able to copy what humans can do. The HP Big Data platform can:

  • Listen to call to our help desk and watch out for "I hate your product" or "I love your product” or “I’m about to churn to a competitor"
  • Look at pictures and say "This is a face and I've compared it with a list of faces of terrorists” (this is what we did for the London Olympics)
  • Look for "I hit the brakes and the car accelerated" in three truck loads of legal discovery documents
  • Look for "what this person is doing is out of compliance" in the emails and phone conversations of financial traders
  • Look thru social media looking for “sentiment” - “I love product X”, “I’m going to go from product X to vendor A’s product”

We look for behaviours

Traditionally, ArcSight is used to look for IT security problems, looking for behaviours that it believes are attacks on data objects and/or applications.

 

But, a recent post on the HP Security blog site talked about how the ArcSight analysis engine was being used to look for fraudulent business behaviours (i.e. not IT security behaviours) such as online banking fraud, compromised accounts, payments fraud, internal fraud, and debit card transactions cheating.

 

HAVEn

HP's Big Data platform is called HAVEn. HAVEn is a name composed of its components' initials - Hadoop, Autonomy, Vertica, Enterprise Security (ArcSight) and "n Apps" (all the apps that run on top of the HAVEn platform).

Let's quickly look at each of these in turn.

 

Hadoop

Hadoop is a way to cost-effectively store massive amounts of data from virtually any source.  It is open source, and we support a number of Hadoop distributions.

 

The Hadoop distributed file system (HDFS) allows for the distributed processing of large data sets across clusters of computers using new programming models. It is designed to scale up from single servers to thousands of machines, each offering local processing and storage. Rather than rely on hardware to deliver high availability, HDFS is designed to detect and handle failures at the application layer, delivering a highly available service on top of a cluster of computers.

All of the HP HAVEn engines (Autonomy IDOL, Vertica, and Arc Sight) are able to interact with Hadoop for data collection and analysis.

 

Hadoop is well-suited to storing and cataloging large amounts of semi-structured data (like logs) and unstructured data (like audio, video, and email). A number of customers move part of the data stored in the Hadoop data sea to the HP HAVEn engines for high speed analysis. Vertica can process data up to 100 times faster than the batch-oriented data processing of Hadoop.

 

IDOL from Autonomy

The "A" (Autonomy) engine is called IDOL - Intelligent Data Operating Layer (yes, probably best to stick with IDOL). IDOL is able to recognise, categorise, and from this, glean concepts and meaning from human interaction.

 

Vertica

The Vertica engine is all about masses and masses of structured data like those micro-transactions that we talked about above. It is used in situations where traditional rational databases simply take too long to search.

Vertica also uses specialised compression techniques to increase storage density.

 

ArcSight

ArcSight is one of HP's security products, so you would guess that the ArcSight engine is use to store and analyse security events. You would guess correctly.

 

But, ArcSight also contains something called a "Logger". The logger can suck log entries from over 300 different sources. It can then enrich, search and analyse the information it collects. This is very useful for security analysis, but also very useful for other things too. For example, HP's Operations Analytics product uses the ArcSight logger to collect and analyse the log files of anything in the IT domain.

 

Author : Mike Shaw

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the Community Guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
About the Author
Mike has been with HP for 30 years. Half of that time was in R&D, mainly as an architect. The other 15 years has been spent in product manag...


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation