Infrastructure Management Software Blog

Automated Event Correlation for the 21st Century – optimized for dynamic environments

Event Correlation technologies have been around for years so there is nothing new to say, right?

Nope... STATIC event correlation technologies have been around for years, and they don't fit flexible environments well - such as virtualized clusters. So how do you implement DYNAMIC event correlation - that reacts to changes in your IT infrastructure.  OMi TOPOLOGY Based Event Correlation could be the answer.

OH, IL, WI, IN, MI Operations Center Technical Roadshow - April 20th to April 29th - Don't miss it!

Ever wish you could talk face-to-face with more technical people about Operations Center and Network Management Center products? Don’t really have the time or budget to travel very far to do so?  Well, here is a great opportunity to meet and talk with technical experts on products like Operations Manager and NNMi – right in your background.


Vivit will be hosting a series of six (6) one-day sessions, where there will be a nice mix between presentations and Q&A sessions around these products.  The sessions will be held in the following states on the following days:


- (Columbus) Ohio – April 20, 2010


- (Orrville) Ohio – April 21, 2010


- (Dearborn) Michigan – April 22, 2010


- Wisconsin – April 27, 2010


- (Chicago) Illinois – April 28, 2010


 - (Fishers) Indiana – April 29, 2010


Feel free to contact me if you have any further questions about this roadshow at asksonja@hp.com.

Labels: agent| agentless| agentless monitoring| agents| automating operations management| automation| BES| BlackBerry Enterprise Server| CMDB| consolidate events| consolidated event| Consolidated Event and Performance Management| consolidated event management| Consolidated Management| correlate events| DDM| Discovery and Dependency Mapping| event console| event consolidation| event correlation| event management| Hewlett Packard| HP Network Node Manager| HP OMi| HP OpenView| HP Operations Center| HP Operations Manager| infrastructure management| infrastructure monitoring| IT dashboard| IT infrastructure management| IT infrastructure monitoring| IT management| manager of managers| managing IT| managing IT infrastructure| managing IT operations| monitoring| Network Management| Network Node Manager| NNM| NNMi| Norm Follett| OM| OMi| OML| OMU| OMU 9.0| OMW| OpenView| OpenView Operations| Operations Center| Operations Manager| Operations Manager i| Operations Manager on Linux| Operations Manager on Unix| Operations Manager on Windows| performance| Performance Agent| performance management| Performance Manager| performance monitoring| SiteScope| Smart Plug-in| Sonja Hickey| SPI| TBEC| Topology Based Event Correlation| topology-based event correlation| virtual server| virtual servers| virtual systems management| virtualization management| Virtualization SPI| virtualization sprawl| virtualization strategy| virtualizationation| virtualized environment| virtualized environments| Virtualized Infrastructure| Vivit

The full stack (OMW, SiteScope, OMi, NNM, Service Desk, CMDB)

As I was getting ready to leave yesterday, a colleague stopped by my desk and asked “do you want to be a hero?” That certainly peaked my interest. It turned out we had a customer downstairs in our executive briefing center that wanted some clarification about all the pieces of our stack fit together.


Background
The customer was the CTO of a major IT firm in the Asia-Pacific region. They manage approximately 4,000 servers using OMW 8.1. They use both agents and SPIs, as well as SiteScope agentless monitoring. In addition, they monitor the faults and performance of their network using NNM, and roll those events into their Operations Manager console. In addition, they use Service Desk 4.5 along with a CMDB (configuration management database) that tracks all the configuration items and relationships among them across their enterprise. A *very* rough schematic  of what they have appears in the diagram below in red.



 Our discussions were divided into two main areas:
1. What they are doing today and what they should be doing?
2. What can they do in the future?


Current Situation
The first question was about best practices. Were they using the software correctly to manage their infrastructure? The answer is a resounding yes. They use OMW as the central event management console, collecting data from agents, SiteScope (agentless monitoring), and NNM for network events.


And, they integrate their service desk with OMW, opening and closing tickets, and tracking changes to the IT infrastructure in their CMDB. They implemented the CMDB about two years ago, in conjunction with their Service Desk implementation.


Next Steps
The next questions focused on what should they be doing or what can they do next to improve their IT management.


We started with a discussion about OMi. The customer was confused about how OMi fits with OMW - the first question was whether it replaces OMW, whether they receive OMi as part of an upgrade (entitlement), and finally, what specific value OMi provides since they currently use OMW as the centralized event consolidation tool.


As readers of this blog know, OMi is a separate product that adds on to Operations Manager. (See green box at top of the above diagram). Its main value is that it leverage the system topology information in the CMDB to greatly speed the time to repair IT problems, especially in complex environments. We have many resources to learn more about OMi, including:
Product overview
High-level webinar on OMi
Deep-dive technical webinar on OMi
Answers to technical questions on OMi


The next topic was automation. We talked about how companies use Operations Orchestration (OO)  to automate their IT processes (runbooks). OO uses events in OM to trigger its process flows. The good news was that this customer has spent the past two years documenting and improving their IT processes. They already know what processes occur frequently and how much manual effort they require. This may be the next logical step for them as it leverages their existing IT infrastructure and processes. EMA recently write a white paper on how process automation augments event consolidation.


Migration Challenges
One issue that arose was that the CMDB connected to their service desk is not the latest UCMDB that OMi uses for its topology-based event correlation (TBEC). The customer has two options here.
1. Leave the existing CMDB in place and let OMi create an operational data store that contains the configuration information it needs. The advantage of this approach is that it leaves the current management infrastructure intact and just adds OMi on top. OMi uses the SPIs to auto-discover the IT infrastructure and relationships among the elements. OMi’s data store is self-contained and requires minimal external input.
2. Migrate the existing CMDB associated with Service Desk to the latest version of UCMDB. The advantage of this approach is that the customer ends up with a single CMDB. They can migrate their existing data using a tool such as ICM (information consolidation manager) from Netscope.


Conclusions
For organizations already integrating their events into a single Operations Manager console, you are on the right track. If you already use a CMDB to track your IT infrastructure, you are very far along the IT management maturity curve, even more so if you use some means of automatic discovery to keep it current.


To take things to the next level, you have two options: focus on further event correlation and reduction with OMi or automate your existing IT processes with Operations Orchestration. You can pursue these in series or in parallel, depending on your priorities. Both will deliver a tangible return on investment and fast payback period.


For HP Operations Center, Peter Spielvogel.


Get the latest updates on our Twitter feed @HPITOps http://twitter.com/HPITOps


Join the HP OpenView & Operations Management group onLinkedIn.

Event Correlation: OMi TBEC and Problem Isolation - What's the difference (part 3 of 3)

If you have not done so already, you may want to start with part 1 in this series.
http://www.communities.hp.com/online/blogs/managementsoftware/archive/2009/09/25/event-correlation-omi-tbec-and-problem-isolation-what-s-the-difference-part-1-of-3.aspx


Read part 2 in the series.
http://www.communities.hp.com/online/blogs/managementsoftware/archive/2009/09/25/event-correlation-omi-tbec-and-problem-isolation-what-s-the-difference-part-2-of-3.aspx



This is the final part in my 3 post discussion of the event correlation technologies within OMi Topology Based Event Correlation (TBEC) and Problem Isolation. I've been focusing on talking about how TBEC is used and how it helps IT Operations Management staff be more effective and efficient.


In my last post I started to mention why End User Monitoring (EUM) technologies are important - because they are able to monitor business applications from an end user perspective. EUM technologies can detect issues which Infrastructure monitoring might miss.


 


In the example we worked through in the last post I mentioned how EUM can detect a response time issue and alert staff that they need to expedite the investigation of an ongoing incident. This is also where Problem Isolation helps. PI provides the most effective means to gather all of the information that we have regarding possible causes of the response time issue and analyze the most likely cause.


 


For example: Our web based ordering system had eight load balanced web servers connected to the internet. These are where our customers connect. The web server farm communicates back to application, database and email servers on the intranet and the overall system allows customers to search and browse available products, place an order and receive email confirmations on order confirmation and shipping status.


 


The event monitoring system includes monitoring of all of the components. We also have EUM probes in place running test transactions and evaluating response time and availability. The systems are all busy but not overloaded - so we are not seeing any performance alerts from the event monitoring system.


 


A problem arises with two of our eight web servers, and they drop out of the load balanced farm. The operations bridge can see that the problem has happened as they receive events indicating the web server issues. TBEC shows that there are two separate issues, so this is not a cascading failure – and the operations staff can see that these web servers are part of the online ordering service.


 


However, they also know that the web servers are part of redundant infrastructure and there should be plenty of spare capacity in the six remaining load balanced web servers. As they have no other events relating to the online ordering service, they decide to leave the web server issues for a little while as they are busy dealing with some database problems for another business service.


 


The entire transaction load that would normally be spread across eight web servers is now focused on the remaining six. They were already busy but now are being pushed even harder, not enough to cause CPU utilization alerts but enough to increase the time that it takes them to process their component of the customer’s online ordering transactions. As a result, response time, as seen by customers, is terrible. The Operations Bridge are unaware as they see no performance alerts form the event management system.


 


EUM is our backstop here; it will detect the response time issue and raise an alert. This alert – indicating that the response time for the online ordering application is unacceptable – is sent to the Operations Bridge.


 


The Operations Bridge team now know that they need to re-prioritize resources to investigate an ongoing business service impacting issue. And they need to do this as quickly as possible. They need to gather all available information about the affected business service and try to understand why response time has suddenly become unacceptable. This is where Problem Isolation helps.


 


PI works to correlate more than just events. It will pull together data from multiple sources - performance history (resource utilizations), events, even help-desk incidents that have been logged and work to determine the likely issue.


 


So we've come full circle. I spent a lot of time talking about OMi, and events and how an Operations Bridge is assisted by TBEC. But it's not the one and only tool that you need in your bag. Technologies like EUM and PI help catch and diagnose all of the stuff that just cannot be detected by 'simply' )I use that term lightly) monitoring infrastructure.


 


Once again if you want to understand PI better I encourage you to take a look at the posts by Michael Procopio over on the BAC blog.



For HP Operations Center, Jon Haworth.

Innovation Week Part 2 - Operations Manager i 8.1

While OMU 9.0 represents evolutionary innovation, Operations Manager i 8.1 (OMi) is truly revolutionary. Operations Manager i is a set of add-on products which extends existing HP Operations Manager to provide advanced event correlation and system health capabilities.


It uses the proven causal engine from NNMi with a completely new set of rules designed for server infrastructure rather than network components. This topology-based event correlation reduces duplication of effort in the Operations Bridge by automatically determining which events are symptoms and which are the cause of a problem. If you don’t have TBEC, you have to write and maintain a ton of rules to eliminate events that are symptoms and not causes. This is time consuming (eight full-time people for a medium-sized European bank) and error prone.


Topology-Based Event Correlation


Conservative estimates indicate that OMi can save over $3 million annually in event processing per year for a company the size of HP. If I can get the real numbers next year, I will post them here.


To learn more about OMi, you can attend a Vivit webinar on July 21, 2009 or EMA webinar on July 28, 2009.


For Operations Center, Peter Spielvogel.

Free Webinar: 5 Tips to Reduce Incident Resolution Costs

On Tuesday July 28, I will be participating in an EMA webinar with researcher and Vice President Dennis Drogseth. The official title “What is New in the Not-so-New Area of Event Management: Five Tips to Reduce Incident Resolution Costs” is very telling. Many people believe that there is nothing new in managing IT infrastructure. The reality is that some of HP’s biggest R&D investments have been in this area.


Displaying disparate events may not be rocket science, but correlating events from different IT domains to determine which is the cause and which are the symptoms certainly is. This is exactly the premise of OMi, which uses topology-based event correlation (TBEC) to consolidate event storms into actionable information.


Here’s the webinar abstract:


Event management may not be the next new thing but it is quietly making dramatic advances that can save your company both time and money. These new approaches rely on understanding up-to-date service dependencies to accelerate problem resolution.


During the 45 minute webinar, we will answer the following questions.



  • Why should you reconsider your event and performance management strategy?

  • What is the impact of ITIL v3 and the concept of an operations bridge on your people, processes, and tools?

  • What innovations can help you more cost-effectively manage events?


We will also leave time at the end to address your questions.


Register for the EMA event management webinar.
www.enterprisemanagement.com/hpeventmanagement



For Operations Center, Peter Spielvogel.

Rip and Replace - Never (Operations Manager has 15 years of stability)

There has been some FUD thrown around by one of our competitors about HP’s commitment to our Operations management products. This nameless competitor is calling for HP customers to migrate to this competing suite of BSM products. They even have specific plays for HP OpenView Operations (now called Operations Center), SiteScope, and several other HP Software products.


Let me be very clear about one thing:
HP Operations Center has NEVER required a rip and replace upgrade!



HP understands the production nature of it IT infrastructure monitoring products and is very sensitive about forcing its customers to migrate. The same cannot be said about our unmentioned competitor. Their migrations are neither easy nor free.


A good example of our commitment to stability is OMi. It introduces significant new functionality such as Topology-Based Event Correlation (TBEC) as an overlay to our existing Operations Manager products that fits seamlessly into existing deployments. This allows customers to leverage their existing investment in management servers, agents, and Smart Plug-Ins - with no rip and replace.


If you have any questions about this or want to discuss our competitors false claims, please let me know.


For Operations Center, Peter Spielvogel

OMi Webinar and Demo Now Available

Every time I speak to customers about consolidated event and performance management, they want to know HP’s vision. What does the end-state look like? How do all the pieces fit together to save my company money? How does an Operation Bridge drive efficiencies? How does OMi extend my existing monitoring infrastructure? Now, we have a recorded webinar that answers these questions.



In 25 minutes, Jon Haworth, one of the Product Marketing Managers for Operations Center will explain how to:



  • increase the efficiency of managing IT Operations

  • cut costs while improving quality of business services

  • speed the time to problem resolution


In addition, Dave Trout shows a short demo of topology-based event correlation in action, including how to:



  • filter events and identify root causes

  • use system health indicators and KPIs to summarize availability and performance

  • visualize configuration items in the context of business services


See the OMi webinar now.


For Operations Center, Peter Spielvogel.

Search
Showing results for 
Search instead for 
Do you mean 
Follow Us


HP Blog

HP Software Solutions Blog

Labels
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation