Fight fires with focus: how to automate incident resolution with IT process orchestration

By Nimish Shelat, Product Marketing Manager, HP Automation and Cloud Management

 

Inside many IT organizations, incident resolution is often a surprisingly manual and complex process. Even when an organization implements event consoles like HP Operations Manager i (OMi) to compile events across multiple domains and weed out irrelevant or duplicate data, Tier 1 and Tier 2 operations still spend much of their workdays responding to alarms and putting out fires.

 firemen.jpg

(Source: Flickr/NY National Guard )

 

But what if most of those firefighting exercises can be eliminated with IT Process Automation? Let’s take a look at how incident remediation works when HP OM operates in concert with IT Process Automation and IT Process Orchestration.

 

FREE: The new HP Operations Orchestration Community Edition

 download now.png

 

How incident resolution works with OM

 

Let’s say that your IT environment experiences 68 million raw events per day (as one HP customer did). HP OM will automate the collection, correlation and deduplication of these events, prioritizing them based on their business impact and then applying automatic-actions to fix common problems. This is an excellent start—as the HP customer found out, it can slash the number of alerts you need to address down to 5,000.

 

However, resolving 5,000 alerts can still add up. Here’s why: When the OM enterprise console presents an alert to a Tier 1 Operations team, they manually turn to reference documentation such as runbooks, knowledge bases, or their own tribal knowledge (or maybe just a note tacked up on their cubicle wall—don’t kid yourself, it happens).

 

 Operations Management.png

Fig. 1: How manual incident remediation processes work with HP Operations Management.

 

But what if first responders can’t resolve the event? Then Tier 1 must escalate to Tier 2 subject matter experts for manual troubleshooting, triage and (ideally) repair (Figure 1, above). Even then, some alerts will not get resolved, at which point Tier 2 administrators create an incident that is routed to an Infrastructure or Applications team to investigate further.

 

Clearly this can be a long, manual process of investigations, trial-and-error fixes and hand-offs by one or several IT personnel.

 

How OM and Operations Orchestration fully automate incident resolution

 

Operations Orchestration (OO) can replace many of the most repetitive processes that Tier 1 and Tier 2 administrators use for investigation and repair (Figure 2).

 

 Process Automation.png

Fig. 2: How process automation remediates incidents with HP Operations Management and HP Operations Orchestration

 

When OM registers an event, it will use policies with criteria you set to trigger OO automated processes for incident resolution. Depending on the event and the policies, OO launches step-by-step logical flows for diagnosis and self-healing repair, delivering acknowledge/annotate alert messages with detailed information that can be reviewed by operators (Figure 3). OO records all flow execution activity for auditing and reporting, and when necessary will automatically create enriched incident tickets to the Service Desk.

 

Operations Orchestrations Flow.png

 

Fig. 3: Example of an HP Operations Orchestration flow 

 

Operator-Assisted Incident Resolution

 

One variation to this fully automated model is to incorporate operator assistance. In this scenario, the OM event alert goes to Tier 1 Operations, which may choose to launch “guided” HP OO flows from the enterprise console menu and make decisions interactively.

 

Of course, not every event will be resolved through OO incident remediation flows, but they can address the vast majority of them in a consistent, standardized way. For example, the HP customer I mentioned above was able to reduce it to a much more manageable 1,500 alerts. Integrating OM and OO allows Tier 1 and Tier 2 personnel to focus their efforts.

 

Experience HP Operations Orchestration for free

The new HP Operations Orchestration Community Edition is a free download of the OO platform with out-of-the-box content packs for automating incident remediation. Designed for easy self-installation, you will be able to begin experiencing within two hours the power of IT process automation and IT operations orchestration.

 

 HP OO-CE.png

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the Community Guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
This account is for guest bloggers. The blog post will identify the blogger.
Featured


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.