What kinds of data are there in Big Data?

Big Data is often described by using "the three Vs" - lots of volume, increased velocity (i.e. we need answers from it more quickly) and variety (i.e. not just structured records). 

 

I think that Big Data can be broken into two different types: transactions (and micro-transactions) and unstructured data.

 

Let's look at each of these in turn, discussing why the term Big Data can be applied to each type.  We'll also look at how you can get a 360 degree view of your customers when you combine structured and unstructured data analysis.

 

Transactions and Micro-transactions

We all know what transactions are. When you have your car serviced, this will be recorded - "Mike had a service on this date and this time. He had three items fixed and they were A, B and C".

 

Micro-transactions are the smaller transactions that occurred during my car service. There may be hundreds or thousands or even millons of them. And today, we throw them all away once a major transaction point is reached. 

  • Once the maintenance is done on my car, all the diagnostic information is thrown away
  • Once the plane lands successfully, most of the engine diagnostics are thrown away
  • Once I buy the item on the web the route I took to get to the purchase point is thrown away
  • Once I've got my new phone setup and working, the route I took to get to that point is thrown away

micro-transactions.001.png

 

(In the diagram above, we are throwing away useful information about the problems the mechanic faced in maintaining Mike's car because we just keep the "Mike's car has had its 50,000 mile service" information. If we kept the micro-transactions and cross referenced them against other maintenance sessions, we may more quickly and accurately find problems with our new car).

 

There is good reason to typically throw away micro-transactions. There are lots and lot of them and to keep all of them to no particular end would quickly overload all of our storage systems.  However, people are now realising that their is valuable information in micro-transactions. They can help us to create better products and services, faster. They can help us better target our customers with marketing offers. They can help us catch fraudsters and cheats. And they can help us optimise our systems better.  Some examples:

 

Better products, faster

We release a new electric bike that we believe is a world beater. If we keep the micro-transactions that are recorded when every bike goes thru maintenance, we may be able to discover those inevitable teething problems that all new products face.


More focussed marketing
We record the micro-transactions of every click of every customer as they use our online game. We use this to notice when a customer is getting bored (their click rate slows down when they get bored) and we offer them a well targeted incentive to get re-engaged with the game.


Optimizing complex systems
We record every video streaming of all our millions of customers and use this to create a geographic map inferring the age demographics of each geographical area (e.g. one area might have young kids because lots of cartoons get streamed. Another may have a lot of retirees because Downton Abbey gets streamed [but only when it's on special offer]). When the new Harry Potter movie comes out, we know to optimise our bandwidth so that the area with the young kids has enough capacity.

 

The explosion of Machine Data

The amount of data automatically generated by machines is set to explode. CISCO estimates that by 2020, there will be in excess of 50 billion connected smart devices and sensors (http://share.cisco.com/internet-of-things.html). This phenomena is also known as "the internet of things". To put "the internet of things" into perspective, there will "only" 6 billion smart phones and tablets by 2020. 

 

The data traffic from these smart devices and sensors is a "Big Data" problem because the volume of data pouring off smart devices is high. 

 

In our view of the world in 2020 (the Technology 2020 section of the introduction to Enterprise 2020) we talk about how data from smart devices and sensor arrays will allow us to control our "mega-systems" in ways we just can't do today. For example, we could optimise the transportation systems of one of our mega-cities for minimised travel time, while maintaining safety for all travellers. 

 

Unstructured Data

When I first heard that unstructured data included social media interactions, I asked myself, "why would anyone want to analyse the seemingly mindless interactions of teenagers?" 

 

But unstructured data is more than just social media interactions and, social media interactions are becoming an increasingly important source of information.

 

Unstructured data includes:

 

Voice

You can analyse the calls made to your call centres looking for products your customer do and don't like, for opportunities to up-sell and cross-sell, and for those calls where the customer is about to "churn".

 

Pictures and video

Pictures : pictures. During the London Olympics, for example, British security services used HP technology to compare the photograph of every visitor to the games against a list of know terror suspects
Video : for number plate and car type recognition, scene recognition and facial recognition

Emails
For example, financial institutions' compliance departments can analyse company emails looking for non-compliant behaviour.

Social media interactions
In among the "rubbish" (of course, one person's "rubbish" is another's fascinating information), there is very useful sentiment information. Do people like your products? How are your competitors doing? Have people find a way of cheating when using your products (people like post about how clever they are).

 

social sentiment.png

 

(Sentiment data can tell us all sort of things. It can tell us about our products; about our competitors; about the likely hood of customers "churning" from us; and about cheating and fraud.)


Documents in SharePoints and other document stores
For example, legal discovery can be tortuous, involving searches thru truckloads of information. Being able to automatically extract meaning from legal documentation allows us to do legal discovery much more quickly and cheaply. In the UK there have recently been a number of cases where child abuse has not been picked up on quickly enough by care services. It is often because different services didn't share the "meaning" of their information with each other. Extracting meaning from case notes and then sharing this meaning between agencies might help to reduce those inter-departmental failures of care.

 

Combining Big Data analysis to get a 360 degree customer view

Data types can be combined to good effect too. By combined structured and unstructured data, we can build up a more complete, "360 degree" view of our customers.

 

360 degree view.png

 

Some examples..

 

Cheating

You can use micro-transaction data analysis to catch those who cheat at online gambling games. But cheats also have a propensity to tell others about how clever they are on social media. With fraud, cheating and security it's all about detecting and taking action as quickly as possible. Combining micro-transactions with social media would allow us to find (and fix) faster than any one data type alone would allow us.

 

Focussed marketing

Record every sale in every one of your retail stores and every transaction on your web site. This will tell you what items are trending and what items are being purchased together. Use unstructured sentiment analysis to tell you about "cool stuff" that maybe you don't yet stock - but should; quickly.

 

Financial compliance

Record and analyse transactions to look for fraud and non-compliance of traders. And analyse your company's emails to get an unstructure view on non-compliance.

 

Support for complex environments

The new Gen8 servers from HP send support "micro-transactions" back to HP. HP puts this information into a Vertica database and use this to look for combinations of hardware and software, or software and software that repeatedly cause problems. 

 

Maybe HP could use sentiment analysis to look at the unstructured data in the support forums to get another angle on Gen8 quality and support issues.

 

HP Software has just released a new product call HP Operations Analytics (HP OA). HP OA records metric, event and log information, and from this, allows support staff to fix complex problems. You could also analyze the voice and email interactions with the support center and correlate this with the structure information. And, you could analyze the application developers' unstructured information to look for solutions to application problems.

 

More Information on Big Data

You can, of course, go to the HP.com web site and search for HAVEn, Vertica and Autonomy.

 

However, I find the blogs from the Autonomy and Vertica teams to be very informative. Also, www.vertica.com has a lot of great information. I find the Vertica reference writeups especially useful. 

 

In the post, I mention the new HP Operations Analytics product. The web site is here. I like the various videos (Chris Tracey / Paul Muller's two videos and the Ian Bromehead video).  I think that HP Operations Analytics is one of those things that is better see than to have described to you. 

 

Author : MIke Shaw

 

Tags: big data
Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the Community Guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
Mike has been with HP for 30 years. Half of that time was in R&D, mainly as an architect. The other 15 years has been spent in product manag...


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation