Forecasting Issues in Your Data Center

175x113_Live-Coverage.gifIf you could forecast potential issues in your data center, what would it mean to you?  Would advance warning technology be useful?  If you could be notified early of an impending problem, would it benefit your business?

 

Announced this week at HP Discover Vienna, HP Service Health Analyzer is providing IT managers this exact capability -- to anticipate, prevent and remediate IT incidents before they impact the business. 

 

Service Health Analyzer (SHA) is a predictive analytics tool within HP’s Business Service Management Portfolio that will change the way IT manages data and operates the data center.  Much like weather forecasters are monitoring and predicting major hurricanes and cyclones, SHA is using advanced analytics to forecast IT storms, which can help you prevent problems from impacting your business.

 

Hurricane and Storm Tracking Analogy

Years ago, weather forecasters relied on static models and sparse observations when trying to predict tropical storms and hurricanes.  They would take all their temperature and barometric pressure readings, look out the window to observe the sky and water conditions, and then consult a set of charts and almanacs.  A day may have begun innocently enough with bright blue skies; then, suddenly, it became overcast and windy.  Forecasters had no way to know if the looming storm would be severe like a hurricane, or just a small tempest that would shortly pass.  These limited forecasts left little time for preparation before a hurricane struck.  Without advanced notification, commerce and the way of life for many were heavily impacted.

 

Trying to identify potential storms of performance degradation or outages in a data center used to be the same way. IT managers would look at a snapshot of their environment, consult a static model, and take their best guess about what things would be like in the future. Sometimes the predictions were accurate and sometimes they’d get blindsided with a problem that would cripple their service.

 

Times have changed.

 

Today weather forecasters are able to provide advanced notifications of tropical storms and even predict their path and intensity, which in turn helps prevent communities and businesses from being totally unprepared for the impact of the storm.  Meteorologists now use a vast network of ground- and ocean-based sensors, satellites, and radar technology to collect data such as air pressure and humidity, ocean temperature and height, ocean currents, and wind speed in real-time. These networked data collector tools are perpetually taking the pulse of the planet and feeding forecasters critical data. 

 

This data is captured in computer forecast models, which analyze the data and calculate likely future weather behavior.  These models also look at seasonality, tide, and trend data to predict the potential of hurricanes.  When an anomaly, or change in the weather pattern that doesn’t align with “normal”, appears in the ocean, the models correlate information and begin sending alerts to the meteorologists.

 

Advanced data collecting tools within IT are also available today.  Organizations are often collecting millions of data points per hour.  But all these metrics are accumulating into an enormous sea of data.  IT managers are often overwhelmed with the ensuing wave of metrics, log files, event consoles, manually managing thresholds, and red gauges.  What has been lacking was the analytic tool set and automated intelligence to correlate these disparate metrics from both an application and a topology perspective to help them predict, or forecast, potential problems on the horizon.  Warning that could help them take the necessary action to remediate the issue before crippling the mission critical service.

 

 

“Run-time” Predictive Analytics

This same predictive analytics tool is essential in managing IT. 

 

Making sure you have complete visibility into the health of your business service, that you can adapt, and even survive, in today’s cloud and virtualized IT environment isn’t just a “nice-to-have.”  It is mandatory.  Managing dynamic infrastructures and applications will take more than just reacting to business services problems when they occur.  You need to be able to anticipate issues and take action to prevent service degradation or outages.  You need better visibility into you how your applications and business services are correlated with your dynamic infrastructure, so you can track irregular behavior to topology changes. And, you need an easier way of determining acceptable thresholds and real anomalies. 

 

Much like the advanced analytics used by today’s hurricane forecasters, HP Service Health Analyzer offers a smarter way to manage IT so you can anticipate IT problems before they occur.  SHA is powered by the Run-time Service Model in order to correlate metric abnormalities with topology. 

 

How Service Health Analyzer Works

Let’s discuss how SHA anticipates issues, prevents business impact, and helps you to remediate the issue quickly.

 

Collects Data 

SHA gathers data from the application, infrastructure, database, network, and middleware monitors, or sensors, as well as topology information from the Run-time Service Model

 

Automatically Learns “Normal” Behavior

SHA uses advanced statistics and sophisticated algorithms created from HP Labs to sift through that sea of metrics and create meaning out of it.  It looks at historical trends and seasonality patterns over time to establish a baseline of what normal behavior should be.  Historical monitoring solutions relied on static thresholds and manual models which were difficult to set accurately and problematic to maintain.  SHA automatically builds the appropriate thresholds in your environment based on the time of day and day of week and can filter out the noise so you will only be alerted to real issues.  That way, you can prioritize effort toward those issues that are most important and start to build workflow automation to remediate the real issues.   

 

Identifies Abnormal Behavior

SHA is powered by an advanced anomaly detection system known as the Run-time Anomaly Detection Engine, or RAD Engine. To define an anomaly, the RAD Engine takes the abnormal metric information gathered from all monitored metrics and couples that with topology information from the Run-time Service Model.  It then determines if there are multiple breaches, from different metrics, affecting the same service.  Using multi-variant analysis, SHA looks at multiple metrics that are behaving abnormally, within a certain time measurement, and that are associated with a single service, to define an anomaly.  This RAD engine was developed by HP Labs and has multiple patents pending.

 

Correlates Topology Information

When an anomaly is detected, SHA automatically captures the current topology of the Configuration Items, or CIs, involved with the event. The value of this is to understand the topology as it was at the time of the anomaly, which is especially valuable when reviewing anomalies that occurred overnight or when there are no on-call operators to address the issues. SHA also collects and presents discovered changes for the relevant CIs so this information can be used as part of the root cause analysis.  Faulty or poorly planned changes are the most common service disruptor in IT, so it’s imperative that a learning engine understand the change “climate” and capture a snapshot of the topology at the time of change.  This insight into correlation results in faster troubleshooting and reduced meant time to repair.

 

Analyzes Current Anomaly with Anomalies from Past

SHA actually analyzes the “DNA” makeup of an anomaly and compares it with other anomalies for a complete historical analysis using a unique tool known as Anomaly DNA Technology.  Finding a similar anomaly with this technology allows SHA to present any previous captured remediation for this event to the operator for immediate action. If the matched anomaly is marked as noise, SHA will suppress any further actions thus saving effort to react to false alerts allowing operators to focus on the real service impacting issues.  

 

Generates Event and sends to Event Correlation Tool

Once an anomaly is defined, SHA generates an event which is sent into your event subsystem and includes a list of the CIs involved in the anomaly. SHA also sets the KPI to critical, which allows application support personnel to react to the updated status.

 

Automated Event-to-Ticket Closure Remediation

And, you can fuse analytics and automation together to remediate issues quickly.  When SHA alerts you to potential issue, the event sub-system can automatically open up a ticket based on the SHA event, and automation can be applied to remediate the issue before your business is impacted. This quick remediation solution simplifies the complexities of virtualization, cloud computing environments.

 

Summary

Both meteorologists and IT can now use dynamic models with sensors to get updates in real time. The data collected goes into a dynamic model and you get an intelligent forecast based on up-to-date information and the latest environmental factors. The forecasts change as the environment changes. This is the elegance of a dynamic model.

 

For more information on HP Service Health Analyzer, visit:

www.hp.com/go/sha

 

 

Related Links:

Related Links:

Labels: Analytics
Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the Community Guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
Product Marketing Manager for HP Application Performance Management suite of software products. Before this role, I worked in the HP Storag...
Featured


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.