Infrastructure Management Software Blog

Automated Event Correlation for the 21st Century – optimized for dynamic environments

Event Correlation technologies have been around for years so there is nothing new to say, right?

Nope... STATIC event correlation technologies have been around for years, and they don't fit flexible environments well - such as virtualized clusters. So how do you implement DYNAMIC event correlation - that reacts to changes in your IT infrastructure.  OMi TOPOLOGY Based Event Correlation could be the answer.

OH, IL, WI, IN, MI Operations Center Technical Roadshow - April 20th to April 29th - Don't miss it!

Ever wish you could talk face-to-face with more technical people about Operations Center and Network Management Center products? Don’t really have the time or budget to travel very far to do so?  Well, here is a great opportunity to meet and talk with technical experts on products like Operations Manager and NNMi – right in your background.


Vivit will be hosting a series of six (6) one-day sessions, where there will be a nice mix between presentations and Q&A sessions around these products.  The sessions will be held in the following states on the following days:


- (Columbus) Ohio – April 20, 2010


- (Orrville) Ohio – April 21, 2010


- (Dearborn) Michigan – April 22, 2010


- Wisconsin – April 27, 2010


- (Chicago) Illinois – April 28, 2010


 - (Fishers) Indiana – April 29, 2010


Feel free to contact me if you have any further questions about this roadshow at asksonja@hp.com.

Labels: agent| agentless| agentless monitoring| agents| automating operations management| automation| BES| BlackBerry Enterprise Server| CMDB| consolidate events| consolidated event| Consolidated Event and Performance Management| consolidated event management| Consolidated Management| correlate events| DDM| Discovery and Dependency Mapping| event console| event consolidation| event correlation| event management| Hewlett Packard| HP Network Node Manager| HP OMi| HP OpenView| HP Operations Center| HP Operations Manager| infrastructure management| infrastructure monitoring| IT dashboard| IT infrastructure management| IT infrastructure monitoring| IT management| manager of managers| managing IT| managing IT infrastructure| managing IT operations| monitoring| Network Management| Network Node Manager| NNM| NNMi| Norm Follett| OM| OMi| OML| OMU| OMU 9.0| OMW| OpenView| OpenView Operations| Operations Center| Operations Manager| Operations Manager i| Operations Manager on Linux| Operations Manager on Unix| Operations Manager on Windows| performance| Performance Agent| performance management| Performance Manager| performance monitoring| SiteScope| Smart Plug-in| Sonja Hickey| SPI| TBEC| Topology Based Event Correlation| topology-based event correlation| virtual server| virtual servers| virtual systems management| virtualization management| Virtualization SPI| virtualization sprawl| virtualization strategy| virtualizationation| virtualized environment| virtualized environments| Virtualized Infrastructure| Vivit

Extending out-of-the-box integration capabilities of HP software products with APIs

A guest post by Alfred Hermann, technical marketing manager for Operations Center.
- Peter


I was looking at Closed Loop Incident Process (CLIP) and wanted to introduce a new member of the HP operations family of products, Operations Manager i (OMi).  My goal was to use OMi as the only operational console, as I hate to switch between consoles for day-to-day operational tasks.


It quickly became apparent that there are many out-of-the-box integrations with HP Service Manager, but no direct integration between OMi and Service Manager. Since OMi is still relatively new, it does not contain some of the integration adapters. However, there is an existing integration between HP Operations Manager and HP Service Manager, and as OMi sits on top of HP Operations Manager, I explored some of the existing OM interfaces hoping to improve the situation.


And this is what I wanted to achieve: OMi has some fancy capabilities around topology based event correlation (TBEC), and thus can identify cause/symptom relationships between events. The existing “scauto” based integration between HP Operations Manager and HP Service Manager, however, will not exchange this important piece of information, a user at the Service Manager console is unable to see how events that have become incidents are related.


What I found is that HP Operations Manager (in my case the Windows management server version) has a wealth of WMI interfaces. Some of them can be used to investigate OM messages as they are stored on the OM for Windows management server. You can walk through the set of CMAs that are attached to an OM message, and create new annotations. In my case I was looking for a particular CMA “CauseEventId” being added to the message, and generated out of that an annotation. The interesting thing is that annotations are synchronized between Operations Manager and Service manager, and as a result of adding a small VB script and a WMI policy I was able to synchronize causal message relationships.


This leads me to the question how widely APIs are used with eg. HP Operations Manager for Windows? Please comment if you have been able to extend the product’s out-of-the-box capabilities by using the provided interfaces.


For HP Operations Center, Alfred Hermann.


Get the latest updates on our Twitter feed @HPITOps http://twitter.com/HPITOps


Join the HP OpenView & Operations Management group on LinkedIn.

The full stack (OMW, SiteScope, OMi, NNM, Service Desk, CMDB)

As I was getting ready to leave yesterday, a colleague stopped by my desk and asked “do you want to be a hero?” That certainly peaked my interest. It turned out we had a customer downstairs in our executive briefing center that wanted some clarification about all the pieces of our stack fit together.


Background
The customer was the CTO of a major IT firm in the Asia-Pacific region. They manage approximately 4,000 servers using OMW 8.1. They use both agents and SPIs, as well as SiteScope agentless monitoring. In addition, they monitor the faults and performance of their network using NNM, and roll those events into their Operations Manager console. In addition, they use Service Desk 4.5 along with a CMDB (configuration management database) that tracks all the configuration items and relationships among them across their enterprise. A *very* rough schematic  of what they have appears in the diagram below in red.



 Our discussions were divided into two main areas:
1. What they are doing today and what they should be doing?
2. What can they do in the future?


Current Situation
The first question was about best practices. Were they using the software correctly to manage their infrastructure? The answer is a resounding yes. They use OMW as the central event management console, collecting data from agents, SiteScope (agentless monitoring), and NNM for network events.


And, they integrate their service desk with OMW, opening and closing tickets, and tracking changes to the IT infrastructure in their CMDB. They implemented the CMDB about two years ago, in conjunction with their Service Desk implementation.


Next Steps
The next questions focused on what should they be doing or what can they do next to improve their IT management.


We started with a discussion about OMi. The customer was confused about how OMi fits with OMW - the first question was whether it replaces OMW, whether they receive OMi as part of an upgrade (entitlement), and finally, what specific value OMi provides since they currently use OMW as the centralized event consolidation tool.


As readers of this blog know, OMi is a separate product that adds on to Operations Manager. (See green box at top of the above diagram). Its main value is that it leverage the system topology information in the CMDB to greatly speed the time to repair IT problems, especially in complex environments. We have many resources to learn more about OMi, including:
Product overview
High-level webinar on OMi
Deep-dive technical webinar on OMi
Answers to technical questions on OMi


The next topic was automation. We talked about how companies use Operations Orchestration (OO)  to automate their IT processes (runbooks). OO uses events in OM to trigger its process flows. The good news was that this customer has spent the past two years documenting and improving their IT processes. They already know what processes occur frequently and how much manual effort they require. This may be the next logical step for them as it leverages their existing IT infrastructure and processes. EMA recently write a white paper on how process automation augments event consolidation.


Migration Challenges
One issue that arose was that the CMDB connected to their service desk is not the latest UCMDB that OMi uses for its topology-based event correlation (TBEC). The customer has two options here.
1. Leave the existing CMDB in place and let OMi create an operational data store that contains the configuration information it needs. The advantage of this approach is that it leaves the current management infrastructure intact and just adds OMi on top. OMi uses the SPIs to auto-discover the IT infrastructure and relationships among the elements. OMi’s data store is self-contained and requires minimal external input.
2. Migrate the existing CMDB associated with Service Desk to the latest version of UCMDB. The advantage of this approach is that the customer ends up with a single CMDB. They can migrate their existing data using a tool such as ICM (information consolidation manager) from Netscope.


Conclusions
For organizations already integrating their events into a single Operations Manager console, you are on the right track. If you already use a CMDB to track your IT infrastructure, you are very far along the IT management maturity curve, even more so if you use some means of automatic discovery to keep it current.


To take things to the next level, you have two options: focus on further event correlation and reduction with OMi or automate your existing IT processes with Operations Orchestration. You can pursue these in series or in parallel, depending on your priorities. Both will deliver a tangible return on investment and fast payback period.


For HP Operations Center, Peter Spielvogel.


Get the latest updates on our Twitter feed @HPITOps http://twitter.com/HPITOps


Join the HP OpenView & Operations Management group onLinkedIn.

Event Correlation: OMi TBEC and Problem Isolation - What's the difference (part 3 of 3)

If you have not done so already, you may want to start with part 1 in this series.
http://www.communities.hp.com/online/blogs/managementsoftware/archive/2009/09/25/event-correlation-omi-tbec-and-problem-isolation-what-s-the-difference-part-1-of-3.aspx


Read part 2 in the series.
http://www.communities.hp.com/online/blogs/managementsoftware/archive/2009/09/25/event-correlation-omi-tbec-and-problem-isolation-what-s-the-difference-part-2-of-3.aspx



This is the final part in my 3 post discussion of the event correlation technologies within OMi Topology Based Event Correlation (TBEC) and Problem Isolation. I've been focusing on talking about how TBEC is used and how it helps IT Operations Management staff be more effective and efficient.


In my last post I started to mention why End User Monitoring (EUM) technologies are important - because they are able to monitor business applications from an end user perspective. EUM technologies can detect issues which Infrastructure monitoring might miss.


 


In the example we worked through in the last post I mentioned how EUM can detect a response time issue and alert staff that they need to expedite the investigation of an ongoing incident. This is also where Problem Isolation helps. PI provides the most effective means to gather all of the information that we have regarding possible causes of the response time issue and analyze the most likely cause.


 


For example: Our web based ordering system had eight load balanced web servers connected to the internet. These are where our customers connect. The web server farm communicates back to application, database and email servers on the intranet and the overall system allows customers to search and browse available products, place an order and receive email confirmations on order confirmation and shipping status.


 


The event monitoring system includes monitoring of all of the components. We also have EUM probes in place running test transactions and evaluating response time and availability. The systems are all busy but not overloaded - so we are not seeing any performance alerts from the event monitoring system.


 


A problem arises with two of our eight web servers, and they drop out of the load balanced farm. The operations bridge can see that the problem has happened as they receive events indicating the web server issues. TBEC shows that there are two separate issues, so this is not a cascading failure – and the operations staff can see that these web servers are part of the online ordering service.


 


However, they also know that the web servers are part of redundant infrastructure and there should be plenty of spare capacity in the six remaining load balanced web servers. As they have no other events relating to the online ordering service, they decide to leave the web server issues for a little while as they are busy dealing with some database problems for another business service.


 


The entire transaction load that would normally be spread across eight web servers is now focused on the remaining six. They were already busy but now are being pushed even harder, not enough to cause CPU utilization alerts but enough to increase the time that it takes them to process their component of the customer’s online ordering transactions. As a result, response time, as seen by customers, is terrible. The Operations Bridge are unaware as they see no performance alerts form the event management system.


 


EUM is our backstop here; it will detect the response time issue and raise an alert. This alert – indicating that the response time for the online ordering application is unacceptable – is sent to the Operations Bridge.


 


The Operations Bridge team now know that they need to re-prioritize resources to investigate an ongoing business service impacting issue. And they need to do this as quickly as possible. They need to gather all available information about the affected business service and try to understand why response time has suddenly become unacceptable. This is where Problem Isolation helps.


 


PI works to correlate more than just events. It will pull together data from multiple sources - performance history (resource utilizations), events, even help-desk incidents that have been logged and work to determine the likely issue.


 


So we've come full circle. I spent a lot of time talking about OMi, and events and how an Operations Bridge is assisted by TBEC. But it's not the one and only tool that you need in your bag. Technologies like EUM and PI help catch and diagnose all of the stuff that just cannot be detected by 'simply' )I use that term lightly) monitoring infrastructure.


 


Once again if you want to understand PI better I encourage you to take a look at the posts by Michael Procopio over on the BAC blog.



For HP Operations Center, Jon Haworth.

Event Correlation: OMi TBEC and Problem Isolation - What's the difference (part 2 of 3)

If you have not done so already, you may want to start with part 1 in this series.
http://www.communities.hp.com/online/blogs/managementsoftware/archive/2009/09/25/event-correlation-omi-tbec-and-problem-isolation-what-s-the-difference-part-1-of-3.aspx


This is part 2 of 3 of my discussion of the event correlation technologies within OMi Topology Based Event Correlation (TBEC) and Problem Isolation. I'm going to focus on talking about how TBEC is used and how it helps IT Operations Management staff be more effective and efficient. My colleague Michael Procopio has discussed PI in more detail over in the BAC blog here: PI and OMi TBEC blog post 


If you think about an Operations Bridge (or "NOC"… but I've blogged my opinion of that term previously) then fundamentally its purpose is very simple.


 


The Ops Bridge is tasked with monitoring the IT Infrastructure (network, servers, applications, storage etc.) for events and resource exceptions which indicate a potential or actual threat to the delivery of the business services which rely on the IT infrastructure. The goal is to fix issues as quickly as possible in order to reduce the occurrence or duration of business service issues.


 


Event detection is an ongoing process 24x7 and the Ops Bridge will monitor the events during all production periods, often 24x7 using shift based teams.


 


Event monitoring is an inexact discipline. In many cases a single incident in the infrastructure will result in numerous events – only one of which actually relates to the cause of the incident, the other events are just symptoms.


 


The challenge for the Ops Bridge staff is to determine which events they need to investigate and to avoid chasing the symptom events. The operations team must prioritize their activities so that they invest their finite resources in dealing with causal events based on their potential business impact, and avoid wasting time in duplication of effort (chasing symptoms) or, even worse, in chasing symptoms down in a serial fashion before they finally investigate the actual causal event, as this will extend the potential for extended downtime of business services.


 


TBEC helps the Operations Bridge in addressing these challenges. TBEC works 24x7, examining the event stream, relating it to the monitored infrastructure and the automatically discovered dependencies between the monitored components. TBEC works to provide a clear indication that specific events are related to each other (related to a single incident) and to identify which event is the causal event and which are symptoms.


 


Consider a disk free space issue on a SAN, which is hosting an oracle database. With comprehensive event monitoring in place, this will result in three events:



  • a disk space resource utilization alert

  • quickly be followed by an Oracle database application error

  • and a further event which indicates that a Websphere server which uses the Oracle database is unhappy


 


Separately, all three events seem ‘important’ – so considerable time could be wasted in duplicate effort as the Ops Bridge tries to investigate all three events. Even worse, with limited resources, it is quite possible that the Operations staff will chase the events ‘top down’ (serially) – look at Websphere first, then Oracle, and finally the SAN – this extends the time to rectification and increases the duration (or potential) of a business outage.


 


TBEC will clearly show that the event indicating the disk space issue on the SAN is the causal event – and the other two events are symptoms.


 


In a perfect world the Ops Bridge can monitor everything, detect every possible event or compromised resource that might impact a business service and fix everything before a business service impact occurs.


 


The introduction of increasingly redundant and flexible infrastructure helps with this – redundant networks, clustered servers, RAID disk arrays, load balanced web servers etc. But, it also can add complications which I’ll illustrate later.


 


One of the challenges of event monitoring is that it simply can NOT detect everything that can impact business service delivery. For example, think about a complex business transaction, which traverses many components in the IT infrastructure. Monitoring of each of the components involved may indicate that they are heavily utilized – but not loaded to the point where an alert is generated.


 


However, the composite effect on the end to end response time of the business transaction may be such that response time is simply unacceptable. For a web based ordering system where customers connect to a company’s infrastructure and place orders for products this can mean the difference between getting orders or the customer heading over to a competitors web site.


 


This is why End User Monitoring technologies are important. I'll talk about EUM in the next, and final, edition of this blog serial.




Read part 3 in the series.
http://www.communities.hp.com/online/blogs/managementsoftware/archive/2009/09/25/event-correlation-omi-tbec-and-problem-isolation-what-s-the-difference-part-3-of-3.aspx



For HP Operations Center,  Jon Haworth.

Event Correlation: OMi TBEC and Problem Isolation - What's the difference (part 1 of 3)

 I often get asked questions about the differences between two of the products in our Business Service Management Portfolio; BAC Problem Isolation and OMi Topology Based Event Correlation. Folks seem to get a little confused by some of the high level messaging around these products and gain the impression that the two products "do the same thing".


I guess that, as part of HPs Marketing organization, I have to take some of the blame for this so I'm going to blog my conscience clear (or try to).


To aid brevity I'll use the acronyms PI for Problem Isolation and TBEC to refer to OMi Topology Based Event Correlation.


On the face of it, there are distinct similarities between what PI and TBEC do.



  • Both products try to help operational support personnel to understand the likely CAUSE of an infrastructure or application incident.

  • Both products use correlation technologies (often referred to as event correlation) to achieve their primary goal.



I'll try to summarize the differences in a few sentences.



  • TBEC correlates events (based on discovered topology and dependencies) continuously to indicate the cause event in a group of related events. TBEC is "bottom up" correlation that works even when there is NO business impact - it is driven by IT infrastructure issues.

  • PI correlates data from multiple sources to determine the cause (or causal configuration item) where a business service impacting incident has occurred (. PI performs correlation "on demand" and based on a much broader set of data than TBEC. PI might be considered "tops down" correlation because it starts from the perspective of a business service impacting issue.



In reality, the differences between the products are best explained by looking at how they are used and I'll use my next couple of blog posts to do exactly that for TBEC. If you want the detail on PI then visit this 


PI and OMi in the BAC blog


 post from my colleague, Michael Procopio.  


 Read part 2 in the series.
http://www.communities.hp.com/online/blogs/managementsoftware/archive/2009/09/25/event-correlation-omi-tbec-and-problem-isolation-what-s-the-difference-part-2-of-3.aspx


Read part 3 in the series.
http://www.communities.hp.com/online/blogs/managementsoftware/archive/2009/09/25/event-correlation-omi-tbec-and-problem-isolation-what-s-the-difference-part-3-of-3.aspx


For HP Operations Center, Jon Haworth.

Everything you wanted to know about OMi... (Q&A from Vivit technical webinar)

Thank you to everyone who attended the Vivit webinar. The recording is now available for viewing on Vivit’s web site. You can also download or view the presentation slides in PDF format. There were many questions from the audience. Jon Haworth and Dave Trout's answers appear below. I have grouped questions by topic.


Product Structure


















Are these 3 different modules to be purchased separately? (topology, event and service views) Yes three different modules. OMi Event Management Foundation is the base product and is a requirement before either of the other two products can be installed. OMi Health Perspective Views and OMi Topology Based Event Correlation are optional modules.
How is the licensing done? There are three separate OMi modules. OMi Event Management Foundation is the base product and is a requirement before either of the other two products can be installed. OMi Health Perspective Views and OMi Topology Based Event Correlation are optional modules. Each module is priced / licensed separately and the pricing model is 'flat' - you purchase the license(s) required and that is all (no CPU or tier or connection based pricing).
How does that scale to thousands of machines? Since we have just introduced OMi, we don't yet have a lot of "real" scalability data to report. However our internal testing so far indicates that OMi can handle the typical event rates handled by OMW/OMU in terms of forwarding events. Like OM today, the scalability of the total solution is not so much limited by how many thousands of machines are being managed but on the total event rate being handled.


Integration with Operations Manager, BAC, UCMDB














































Is there any description about the interface between OM and OMi. There are two interfaces used: 1) Message forwarding from OM to OMi, and 2) Web Services interface for message changes and Topology synchronization.
How is the integration with Operations Manager on Unix? As mentioned during the webinar, OMi requires either OMU or OMW as the event consolidation point for forwarding events into OMi. The event forwarding is configured in OM exactly the same way as if forwarding to another OM server. For message updates and topology synchronization, a Web Services interface is used.
Since it was mentioned it works with both OMU 9.0 and OMW 8.10, does it work with the mentioned SPIs on both platforms ? Yes. We are updating the SPIs to be "OMi ready". What this really means is that we're adding  a little extra information to the event messages (via Custom Message Attributes) to make it 'easier' for OMi to associate a message with the correct CI in the UCMDB and to include specific indicators needed for the TBEC rules in OMi. For OMU 9 we will release some updated SPIs soon which include enhanced discovery - very similar levels of discovery to what OMW has. The discovery subsystem is an area that we enhanced in OMU 9 and we want to be able to use the SPI discovery data as the 'starting point' for populating and maintaining CI and relationship information in the UCMDB - which is what helps to drive the logic in OMi.
How flexible are the integration with BAC products? Are these factory built and need factory to modify due to target environment requirement OMi and BAC use the same UCMDB instance so they are tightly integrated 'out of the box'. OMi is completely built on top of the BAC platform technology. It supports the same security mechanisms, the same HA configuration options, the same user/group definitions, etc. In short, OMi is just like any other BAC "application" that is leveraging the platform.
In the installation guide, it says that one of the requirements is to install the "BSM platform". What exactly do you understand on "BSM platform"? BSM platform means "BAC". OMi 8.10 requires BAC 8.02 as the BSM platform.
Can you run OMi without BSM? No, the BSM platform provides the user interface 'framework' and the runtime UCMDB. OMi plugs into the BSM foundation.
Which security model will take precedence - OMU responsibility matrix or the BAC security for views? OMi security is entirely based on the BAC platform features. Access to OMi views, admin UIs, etc. is all controlled through the standard BAC security features (users/groups, roles, permissions, etc.)
Which security model will take precedence - OMU responsibility matrix or the BAC security for views? OMi security is entirely based on the BAC platform features. Access to OMi views, admin UIs, etc. is all controlled through the standard BAC security features (users/groups, roles, permissions, etc.)
What is the price policy if you have / have not BAC already installed? Having BAC installed makes no difference to the price. OMi includes all components needed (runtime UCMDB etc.) in the license. Pricing is based on a 'flat' price for each of the three modules (see earlier question). You need to contact your local HP sales representative to obtain local pricing.
CI treeview scale? The CI Tree view is basically a UCMDB VIEW/TQL under the covers. TQLs in UCMDB are tuned for VERY efficient retrieval of CI information.


Integration with Ticketing Systems (Service Manager, Service Center)






















How does OMi interact with any ticketing system like Service Manager or Service Center. Will the Ci's health be reflected based on ticket info? In this first release of OMi, there is no direct interaction with a ticketing system. The interaction is driven through the existing OM (OMW or OMU) to Service Manager / Service Center interface. Because OMi synchronizes message changes back to the OM server that it is connected to, trouble tickets can be triggered from that OM server.
How does this interface to Service Manager 7? The interface to SM 7 is driven through the existing OM (OMW or OMU) interface to Service Manager. Because OMi synchronizes message changes back to the OM server that it is connected to, trouble tickets can be triggered from that OM server.
The slides implied "assignment" which looked similar to NNMi. How do the new features of OMi integrate to Service Manager? The concept of assignment is 'internal' to OMi. In many organizations the tier 1 support personnel will deal with non Business Service impacting issues without raising a trouble ticket. NOTE: this is purely dependent on the individual process and organization structure that is selected, we know that a lot of companies work this way to minimize the number of TTs. Some organizations insist that every actionable 'incident' becomes a TT. Where an event is dealt with in OMi then assignment makes sense, where events are forwarded to SM7 or another TT system then assignment will likely take place in the Incident / Helpdesk system.
Will OMi integrate with ITSM (change management app from Front Range)?  Also, I'm assuming that we will need to purchase CMDB for event correlation regardless - is that true?  Cannot comment on the Front Range application. It is likely that an integration may be possible but it would be wise to verify with the vendor what external interfaces they provide for integrating event management systems with their product. No you do not need to purchase UCMDB - we provide a 'free' runtime with OMi.


UCMDB, Discovery and Smart Plug-Ins (SPIs)


































Is it necessary to have UCMDB to have OMi? OMi ships with a "BAC 8.02" media kit. This actually provides the BSM PLatform - including UCMDB - and is licensed using your OMi license key. If you do not have an existing UCMDB then this will provide a runtime UCMDB as part of the OMi product package. If you have an existing BAC 8.02 installed (which includes UCMDB) then you can utilize that for OMi.
Is discovery best done in OMi or uCMDB? All discovery data is maintained in the UCMDB. The 'base' discovery for OMi will be provided by the Smart PlugIns that have been deployed from the OMW or OMU instance that OMi is connected to. Additional discovery data can be added to the UCMDB - for example from NNMi or DDM - and OMi will make use of this discovery data if it exists.
If using DDM for discovery, DDM-Advanced is recommended since it can discover not only hosts but also applications and their relationships.
Can you please tell me if DDMi can be used as a feed? Yes. Servers discovered by DDMi are inserted into UCMDB. However be aware that DDMi does not discover applications and dependencies/relationships. DDM-Advanced is the recommended discovery approach if you plan to use OMi and leverage the TBEC rules in particular.
If uCMDB already has CIs populates by DDM, would the new sources like NNMi , SPIs conflict with them , in other words do we need a clean uCMDB ? No. A clean UCMDB is not required. OMi is designed to work with CIs reqardless of how they are discovered and inserted into the UCMDB. In general, reconciliation of CIs discovered from multiple sources is handled automatically.
Can you clarify what you mean by "we are including these SPIs"? Does this mean it's part of the shrink wrap deliverable with OMi?  What specifically will the virtualization SPI provide?  We were considering another product for that space, but want to hear more about those capabilities. We are not including SPIs with OMi. We are including pre-defined content (event type indicator mappings, health indicators, TBEC correlation rules) for the SPIs that we noted. If you have these SPIs deployed then the time to value when OMi is deployed will be very quick. HP released a SPI for Virtualized Infrastructure monitoring earlier this year. Initial focus is on VMware but we will be providing an update soon with more features. You can contact your HP Software Sales Representative to get more details of the specific functionality provided.
 What is the virtualization SPI? Is it nWorks SPI ? No. HP released the Smart PlugIn for Virtualized Infrastructure early in 2009. This is a HP developed and marketed product.
nWorks is the "SPI" we were considering This is a different SPI and is based on a different architecture (agentless polling). It has no OMi content at present and it will be the responsibility of Nworks / Veeam to provide this.


KPIs (Key Performance Indicators)


















What is a KPI? KPI - means Key Performance Indicators
where do you define the KPIs? OMi provides four KPIs to the BAC platform: Operations Performance, Operations Availability, Unresolved Events, Unassigned Events. These are defined by OMi, not by users. What IS configurable is which Health Indicators (HIs) are assigned to impact either the Operations Performance or Operations Availability KPI for specific CI Types. This is done using the Indicator Manager in OMi.
If the difference is KPI, why data is not collected from PM. Instead I see that the data is collected from OVPA & OV agents. OMi is focused around event processing. Events (alerts) are 'collected' from OVPA and OV agents to enable operations staff to understand what needs to be 'fixed'. PM (Performance Manager) is one tool that can be used to assist in the analysis / diagnosis of performance problems. PM is actually integrated into the OMi user interface.


Topology-Based Event Correlation (TBEC)


























In the slide with "Carol" and "Bill", they applied their knowledge to (I guess) develop some rules?  Is that work that still has to be done manually?  What were they developing - KPIs? No not KPIs. The example is there to show how TBEC rules are simple to create but that the correlation engine 'chains' them together to provide quite complex correlations logic which adapts based on the topology that has been discovered. We (HP) are providing content (Event Type Indicators, Health Indicators, TBEC rules as per "Carol and Bill") for a number of our existing Operations Manager Smart PlugIns with OMi and we will continue to add additional content moving forwards. The example in the slide is there to illustrate the process (simple process) of creating very powerful correlation rules which adapt to changes in the discovered infrastructure. You would only need to undertake this process where HP does not provide out of the box content with OMi.
I have some questions regarding the TBEC, is there any experience regarding the performance?
How many events can be handled by the correlation engine per sec?
The engine is tuned for very high performance. It is basically the same engine that is used in NNMi for correlations.
With topology synchronization with NNMi do you have to have OMi licenses for every node in NNMi as well? ... I.E. if you are using Topology Synchronization with NNMi will it only show the nodes from NNMi that have OMi agents installed? No. All CIs in the UCMDB are visible to OMi. No additional license costs are required for NNMi nodes which are added to the UCMDB.
Which language is used for the correlation rules? And where are the rules defined ? (UCMDB?) TBEC is configured in the OMi Correlation Manager GUI, there is no programming language involved. The rules are based on topology (a View from the UCMDB) and on specific Health Indicators with specific HI values.
Does OMi support the execution of validation routines when closing an Alert/Event that also closes other related items? Not currently out of the box. There are several configurable settings which affect TBEC behavior (e.g. correlation time window, automatic extension of time windows, etc.), but currently this is not one of them. We are considering additional options for the future.


OMi Features


























Scalability, High Availability Cluster Support?  Estimated max seats before going distributed? OMi supports the same cluster/HA features as supported by BAC. For example, you can have multiple gateway servers connected to a clustered Data Processing Server and a remote database server. In this case, OMi software is installed on each of these separate servers (gateways and DPS). In general, the "max seats before going distributed" (i.e. adding gateway servers) would be driven by the same considerations as documented for BAC itself. More information specific to OMi environments will be available over time as we have a chance to do further testing and characterization.
Does OMi have a reports generator showing things like daily TBEC, etc.? Not currently. However the BAC reports (e.g. KPIs over Time) can be used to look at how the OMi KPIs are changing over time on CIs.
Comment: We feel that most of these features being discussed in OMi should have been as an upgrade to OMW. Too many modules to buy and try to integrate ourselves. For example we wanted a better version of the OVOWeb to come as an upgrade in OMW8.1. Too many products to buy just to manage our network. OMi is providing discreet and incremental value above and beyond what is provided in OMW or OMU. We are continuing to enhance both OMW and OMU (for example the recent release of OMU 9.0) and customers who are happy with the capabilities of these platforms can continue to move forwards and take advantage of the enhancements that we are providing. There is no requirement to move to OMi.
We feel we are being charged for features that were supposed to be in products that we already purchased. We are not happy about the tactic of releasing new products to fix features that were advertised in prior software. As a consultant, even I get lost in the vast amount of monitoring tools being sold by HP. OMi  is providing discreet and incremental value above and beyond what is provided in OMW or OMU. This functionality was never offered as part of OMW or OMU - it is new and unique to OMi. The reality is that it would have been extremely difficult, and time consuming (slow to release) to provide the high value capabilities of OMi within OMW or OMU. The strategy we have choosen is to base these new capabilities on a 'clean' build based on contemporary technologies - but HP has specifically ensured that existing OM customers who wish to take advantage of these new capabilities can do so without having to disrupt their existing OM installation.
I had some issue when trying to setup and run the synchronization tool and event forwarding. Who can I contact? You should contact your normal HP support channel for assistance.


Other














Is there an estimated time line for detailed technical training on OMi? We have just run a series of virtual instructor led training sessions for our partners. HP Education Services will be releasing an OMi class in the near future.
Where can I get an evaluation version of OMi? You can request a DVD from the trial software web site. A download will be available at http://www.hp.com/go/omi soon.


 


 For HP Operations Center, Peter Spielvogel.


Get the latest updates on our Twitter feed @HPITOps http://twitter.com/HPITOps


Join the HP OpenView & Operations Management group onLinkedIn.



 

Innovation Week Part 2 - Operations Manager i 8.1

While OMU 9.0 represents evolutionary innovation, Operations Manager i 8.1 (OMi) is truly revolutionary. Operations Manager i is a set of add-on products which extends existing HP Operations Manager to provide advanced event correlation and system health capabilities.


It uses the proven causal engine from NNMi with a completely new set of rules designed for server infrastructure rather than network components. This topology-based event correlation reduces duplication of effort in the Operations Bridge by automatically determining which events are symptoms and which are the cause of a problem. If you don’t have TBEC, you have to write and maintain a ton of rules to eliminate events that are symptoms and not causes. This is time consuming (eight full-time people for a medium-sized European bank) and error prone.


Topology-Based Event Correlation


Conservative estimates indicate that OMi can save over $3 million annually in event processing per year for a company the size of HP. If I can get the real numbers next year, I will post them here.


To learn more about OMi, you can attend a Vivit webinar on July 21, 2009 or EMA webinar on July 28, 2009.


For Operations Center, Peter Spielvogel.

Free Webinar: HP Operations Manager i Software Deep Dive Presentation

My colleague and consolidated event management expert Jon Haworth is the guest speaker at an upcoming Vivit webinar on Tuesday July 21 . Vivit  is the independent HP Software users community.



Jon will talk about using an operations bridge effectively and how the latest advanced correlation and visualization technology can help you reduce downtime. His presentation will address:
• What are the major differences between HP Operations Manager and HP Operations Manager i software?
• How does Topology Based Event Correlation (TBEC) work?
• How does HP OMi fit into my existing Operations Manager environment?


There will be plenty of time for Jon to answer your questions at the end of the session.


For Operations Center, Peter Spielvogel.

Free Webinar: 5 Tips to Reduce Incident Resolution Costs

On Tuesday July 28, I will be participating in an EMA webinar with researcher and Vice President Dennis Drogseth. The official title “What is New in the Not-so-New Area of Event Management: Five Tips to Reduce Incident Resolution Costs” is very telling. Many people believe that there is nothing new in managing IT infrastructure. The reality is that some of HP’s biggest R&D investments have been in this area.


Displaying disparate events may not be rocket science, but correlating events from different IT domains to determine which is the cause and which are the symptoms certainly is. This is exactly the premise of OMi, which uses topology-based event correlation (TBEC) to consolidate event storms into actionable information.


Here’s the webinar abstract:


Event management may not be the next new thing but it is quietly making dramatic advances that can save your company both time and money. These new approaches rely on understanding up-to-date service dependencies to accelerate problem resolution.


During the 45 minute webinar, we will answer the following questions.



  • Why should you reconsider your event and performance management strategy?

  • What is the impact of ITIL v3 and the concept of an operations bridge on your people, processes, and tools?

  • What innovations can help you more cost-effectively manage events?


We will also leave time at the end to address your questions.


Register for the EMA event management webinar.
www.enterprisemanagement.com/hpeventmanagement



For Operations Center, Peter Spielvogel.

Rip and Replace - Never (Operations Manager has 15 years of stability)

There has been some FUD thrown around by one of our competitors about HP’s commitment to our Operations management products. This nameless competitor is calling for HP customers to migrate to this competing suite of BSM products. They even have specific plays for HP OpenView Operations (now called Operations Center), SiteScope, and several other HP Software products.


Let me be very clear about one thing:
HP Operations Center has NEVER required a rip and replace upgrade!



HP understands the production nature of it IT infrastructure monitoring products and is very sensitive about forcing its customers to migrate. The same cannot be said about our unmentioned competitor. Their migrations are neither easy nor free.


A good example of our commitment to stability is OMi. It introduces significant new functionality such as Topology-Based Event Correlation (TBEC) as an overlay to our existing Operations Manager products that fits seamlessly into existing deployments. This allows customers to leverage their existing investment in management servers, agents, and Smart Plug-Ins - with no rip and replace.


If you have any questions about this or want to discuss our competitors false claims, please let me know.


For Operations Center, Peter Spielvogel

Automated Infrastructure Discovery - Extreme Makeover

Good Discovery Can Uncover Hidden Secrets
Infrastructure discovery has something of a bad reputation in some quarters. We've done some recent surveys of companies utilizing a variety of vendors’ IT operations products. What's interesting is that, in our survey results, automated infrastructure discovery fared pretty badly in terms of the support that it received within organizations - and also in terms of the success that they believed they had achieved.
 
There are a number of reasons underlying these survey results. Technology issues and organizational challenges were highlighted in our survey. But I believe that one of the main 'issues' that discovery has is that people have lost sight of its basic values and the benefits that they can bring. Organizations see 'wide reaching' discovery initiatives as complex to implement and maintain - and they do not see compelling short term benefits.
 
I got to thinking about discovery and the path that it has taken over the last 15 or 20 years. I remember the excitement when HP released its first cut of Network Node Manager. It included discovery that showed people things about their networks that they just did not know. There were always surprises when we took NNM into new sites to demonstrate it. Apart from showing folks what was actually connected to the network, NNM also showed how the network was structured, the topology.
 
Visualization --> Association --> Correlation
And once people can see and visualize those two sets of information they start to make associations about how events detected in the network relate to each other - they use the discovery information to optimize their ability to operate the network infrastructure.
 
So the next logical evolution for tools like NNM was to start building some of the analysis into the software as 'correlation'. For example the ability to determine that the 51 "node down" events you just received are actually just one "router down' event and 50 symptoms generated by the nodes that are 'behind' the router in the network topology. Network operators could ignore the 'noise' and focus on the events that were likely causes of outages. Pretty simple stuff (in principle) but very effective at optimizing operational activities.
 
Scroll forward 15 years. Discovery technologies now extend across most aspects of infrastructure and the use cases are much more varied. Certainly inventory maintenance is a key motivator for many organizations - both software and hardware discovery play important roles in supporting asset tracking and license compliance activities. Not hugely exciting for most Operational Management teams.
 
Moving Towards Service Impact Analysis
Service impact analysis is a more significant capability for Operations Management teams and is a goal that many organizations are chasing. Use discovery to find all my infrastructure components - network devices, servers, application and database instances - and tie them together so I can see how my Business Services are using the infrastructure. Then, when I detect an event on a network device or database I can understand which Business Services might be impacted and I can prioritize my operational resources and activities. Some organizations are doing this quite successfully and getting significant benefits in streamlining their operational management activities and aligning them with the priorities of the business.
 
But there is one benefit of discovery which seems to have been left by the side of the road. The network discovery example I started with provides a good reference. Once you know what is 'out there' and how it is connected together then you can use that topology information to understand how failures in one part of the infrastructure can cause 'ghost events' - symptom events' - to be generated by infrastructure components which rely in some way on the errant component. When you get 5 events from a variety of components - storage, database, email server, network devices - then if you know how those components are 'connected' you can relate the events together and determine which are symptoms and which is the likely cause.
 
Optimizing the Operations Bridge
Now, to be fair, many organizations understand that this is important in optimizing their operational management activities. In our survey, we found that many companies deploy skilled people with extensive knowledge of the infrastructure into the first level operations bridge to help make sense of the event stream - try to work out which events to work on and which are dead ends. But it's expensive to do this - and not entirely effective. Operations still end up wasting effort by chasing symptoms before they deal with the actual cause event. Inevitably this increases mean time to repair, increases operational costs and degrades the quality of service delivered to the business.
 
So where is the automation? We added correlation to network monitoring solutions years ago to help do exactly this stuff, why not do 'infrastructure wide' correlation'?
 
Well, it's a more complex problem to solve of course. And there is also the problem that many (most?) organizations just do not have comprehensive discovery across all of their infrastructure. Or if they do have good coverage it's from a variety of tools so it's not in one place where all of the inter-component relationships can be analyzed.
 
Topology Based Event Correlation - Automate Human Judgment
This is exactly the problem which we've been solving with our Topology Based Event Correlation (TBEC)  technology. Back to basics - although the developers would not thank me for saying that, as it's a complex technology. Take events from a variety of sources, do some clever stuff to map them to the discovered components in the discovery database (discovered using a number of discrete tools) and then use the relationships between the discovered components to automatically do what human operators are trying to do manually - indicate the cause event.
 
Doing this stuff automatically for network events made sense 15 years ago, doing it across the complexity of an entire infrastructure makes even more sense today. It eliminates false starts and wasted effort.
 
This is a 'quick win' for Operational Management teams. Improved efficiency, reduced operational costs, free up senior staff to work on other activities… better value delivered to the business (and of course huge pay raises for the Operations Manager).
 
So what do you need to enable TBEC to help streamline your operations. Well, you need events from infrastructure monitoring tools - and most organizations have more than enough of those. But you also need infrastructure discovery information - the more the better.
 
Maybe infrastructure discovery needs a makeover.

 

For HP Operations Center, Jon Haworth


 

Search
Showing results for 
Search instead for 
Do you mean 
HP Blog

HP Software Solutions Blog

Featured


Follow Us
Labels
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.