Infrastructure Management Software Blog

OH, IL, WI, IN, MI Operations Center Technical Roadshow - April 20th to April 29th - Don't miss it!

Ever wish you could talk face-to-face with more technical people about Operations Center and Network Management Center products? Don’t really have the time or budget to travel very far to do so?  Well, here is a great opportunity to meet and talk with technical experts on products like Operations Manager and NNMi – right in your background.


Vivit will be hosting a series of six (6) one-day sessions, where there will be a nice mix between presentations and Q&A sessions around these products.  The sessions will be held in the following states on the following days:


- (Columbus) Ohio – April 20, 2010


- (Orrville) Ohio – April 21, 2010


- (Dearborn) Michigan – April 22, 2010


- Wisconsin – April 27, 2010


- (Chicago) Illinois – April 28, 2010


 - (Fishers) Indiana – April 29, 2010


Feel free to contact me if you have any further questions about this roadshow at asksonja@hp.com.

Labels: agent| agentless| agentless monitoring| agents| automating operations management| automation| BES| BlackBerry Enterprise Server| CMDB| consolidate events| consolidated event| Consolidated Event and Performance Management| consolidated event management| Consolidated Management| correlate events| DDM| Discovery and Dependency Mapping| event console| event consolidation| event correlation| event management| Hewlett Packard| HP Network Node Manager| HP OMi| HP OpenView| HP Operations Center| HP Operations Manager| infrastructure management| infrastructure monitoring| IT dashboard| IT infrastructure management| IT infrastructure monitoring| IT management| manager of managers| managing IT| managing IT infrastructure| managing IT operations| monitoring| Network Management| Network Node Manager| NNM| NNMi| Norm Follett| OM| OMi| OML| OMU| OMU 9.0| OMW| OpenView| OpenView Operations| Operations Center| Operations Manager| Operations Manager i| Operations Manager on Linux| Operations Manager on Unix| Operations Manager on Windows| performance| Performance Agent| performance management| Performance Manager| performance monitoring| SiteScope| Smart Plug-in| Sonja Hickey| SPI| TBEC| Topology Based Event Correlation| topology-based event correlation| virtual server| virtual servers| virtual systems management| virtualization management| Virtualization SPI| virtualization sprawl| virtualization strategy| virtualizationation| virtualized environment| virtualized environments| Virtualized Infrastructure| Vivit

Extending out-of-the-box integration capabilities of HP software products with APIs

A guest post by Alfred Hermann, technical marketing manager for Operations Center.
- Peter


I was looking at Closed Loop Incident Process (CLIP) and wanted to introduce a new member of the HP operations family of products, Operations Manager i (OMi).  My goal was to use OMi as the only operational console, as I hate to switch between consoles for day-to-day operational tasks.


It quickly became apparent that there are many out-of-the-box integrations with HP Service Manager, but no direct integration between OMi and Service Manager. Since OMi is still relatively new, it does not contain some of the integration adapters. However, there is an existing integration between HP Operations Manager and HP Service Manager, and as OMi sits on top of HP Operations Manager, I explored some of the existing OM interfaces hoping to improve the situation.


And this is what I wanted to achieve: OMi has some fancy capabilities around topology based event correlation (TBEC), and thus can identify cause/symptom relationships between events. The existing “scauto” based integration between HP Operations Manager and HP Service Manager, however, will not exchange this important piece of information, a user at the Service Manager console is unable to see how events that have become incidents are related.


What I found is that HP Operations Manager (in my case the Windows management server version) has a wealth of WMI interfaces. Some of them can be used to investigate OM messages as they are stored on the OM for Windows management server. You can walk through the set of CMAs that are attached to an OM message, and create new annotations. In my case I was looking for a particular CMA “CauseEventId” being added to the message, and generated out of that an annotation. The interesting thing is that annotations are synchronized between Operations Manager and Service manager, and as a result of adding a small VB script and a WMI policy I was able to synchronize causal message relationships.


This leads me to the question how widely APIs are used with eg. HP Operations Manager for Windows? Please comment if you have been able to extend the product’s out-of-the-box capabilities by using the provided interfaces.


For HP Operations Center, Alfred Hermann.


Get the latest updates on our Twitter feed @HPITOps http://twitter.com/HPITOps


Join the HP OpenView & Operations Management group on LinkedIn.

Event Correlation: OMi TBEC and Problem Isolation - What's the difference (part 2 of 3)

If you have not done so already, you may want to start with part 1 in this series.
http://www.communities.hp.com/online/blogs/managementsoftware/archive/2009/09/25/event-correlation-omi-tbec-and-problem-isolation-what-s-the-difference-part-1-of-3.aspx


This is part 2 of 3 of my discussion of the event correlation technologies within OMi Topology Based Event Correlation (TBEC) and Problem Isolation. I'm going to focus on talking about how TBEC is used and how it helps IT Operations Management staff be more effective and efficient. My colleague Michael Procopio has discussed PI in more detail over in the BAC blog here: PI and OMi TBEC blog post 


If you think about an Operations Bridge (or "NOC"… but I've blogged my opinion of that term previously) then fundamentally its purpose is very simple.


 


The Ops Bridge is tasked with monitoring the IT Infrastructure (network, servers, applications, storage etc.) for events and resource exceptions which indicate a potential or actual threat to the delivery of the business services which rely on the IT infrastructure. The goal is to fix issues as quickly as possible in order to reduce the occurrence or duration of business service issues.


 


Event detection is an ongoing process 24x7 and the Ops Bridge will monitor the events during all production periods, often 24x7 using shift based teams.


 


Event monitoring is an inexact discipline. In many cases a single incident in the infrastructure will result in numerous events – only one of which actually relates to the cause of the incident, the other events are just symptoms.


 


The challenge for the Ops Bridge staff is to determine which events they need to investigate and to avoid chasing the symptom events. The operations team must prioritize their activities so that they invest their finite resources in dealing with causal events based on their potential business impact, and avoid wasting time in duplication of effort (chasing symptoms) or, even worse, in chasing symptoms down in a serial fashion before they finally investigate the actual causal event, as this will extend the potential for extended downtime of business services.


 


TBEC helps the Operations Bridge in addressing these challenges. TBEC works 24x7, examining the event stream, relating it to the monitored infrastructure and the automatically discovered dependencies between the monitored components. TBEC works to provide a clear indication that specific events are related to each other (related to a single incident) and to identify which event is the causal event and which are symptoms.


 


Consider a disk free space issue on a SAN, which is hosting an oracle database. With comprehensive event monitoring in place, this will result in three events:



  • a disk space resource utilization alert

  • quickly be followed by an Oracle database application error

  • and a further event which indicates that a Websphere server which uses the Oracle database is unhappy


 


Separately, all three events seem ‘important’ – so considerable time could be wasted in duplicate effort as the Ops Bridge tries to investigate all three events. Even worse, with limited resources, it is quite possible that the Operations staff will chase the events ‘top down’ (serially) – look at Websphere first, then Oracle, and finally the SAN – this extends the time to rectification and increases the duration (or potential) of a business outage.


 


TBEC will clearly show that the event indicating the disk space issue on the SAN is the causal event – and the other two events are symptoms.


 


In a perfect world the Ops Bridge can monitor everything, detect every possible event or compromised resource that might impact a business service and fix everything before a business service impact occurs.


 


The introduction of increasingly redundant and flexible infrastructure helps with this – redundant networks, clustered servers, RAID disk arrays, load balanced web servers etc. But, it also can add complications which I’ll illustrate later.


 


One of the challenges of event monitoring is that it simply can NOT detect everything that can impact business service delivery. For example, think about a complex business transaction, which traverses many components in the IT infrastructure. Monitoring of each of the components involved may indicate that they are heavily utilized – but not loaded to the point where an alert is generated.


 


However, the composite effect on the end to end response time of the business transaction may be such that response time is simply unacceptable. For a web based ordering system where customers connect to a company’s infrastructure and place orders for products this can mean the difference between getting orders or the customer heading over to a competitors web site.


 


This is why End User Monitoring technologies are important. I'll talk about EUM in the next, and final, edition of this blog serial.




Read part 3 in the series.
http://www.communities.hp.com/online/blogs/managementsoftware/archive/2009/09/25/event-correlation-omi-tbec-and-problem-isolation-what-s-the-difference-part-3-of-3.aspx



For HP Operations Center,  Jon Haworth.

Event Correlation: OMi TBEC and Problem Isolation - What's the difference (part 1 of 3)

 I often get asked questions about the differences between two of the products in our Business Service Management Portfolio; BAC Problem Isolation and OMi Topology Based Event Correlation. Folks seem to get a little confused by some of the high level messaging around these products and gain the impression that the two products "do the same thing".


I guess that, as part of HPs Marketing organization, I have to take some of the blame for this so I'm going to blog my conscience clear (or try to).


To aid brevity I'll use the acronyms PI for Problem Isolation and TBEC to refer to OMi Topology Based Event Correlation.


On the face of it, there are distinct similarities between what PI and TBEC do.



  • Both products try to help operational support personnel to understand the likely CAUSE of an infrastructure or application incident.

  • Both products use correlation technologies (often referred to as event correlation) to achieve their primary goal.



I'll try to summarize the differences in a few sentences.



  • TBEC correlates events (based on discovered topology and dependencies) continuously to indicate the cause event in a group of related events. TBEC is "bottom up" correlation that works even when there is NO business impact - it is driven by IT infrastructure issues.

  • PI correlates data from multiple sources to determine the cause (or causal configuration item) where a business service impacting incident has occurred (. PI performs correlation "on demand" and based on a much broader set of data than TBEC. PI might be considered "tops down" correlation because it starts from the perspective of a business service impacting issue.



In reality, the differences between the products are best explained by looking at how they are used and I'll use my next couple of blog posts to do exactly that for TBEC. If you want the detail on PI then visit this 


PI and OMi in the BAC blog


 post from my colleague, Michael Procopio.  


 Read part 2 in the series.
http://www.communities.hp.com/online/blogs/managementsoftware/archive/2009/09/25/event-correlation-omi-tbec-and-problem-isolation-what-s-the-difference-part-2-of-3.aspx


Read part 3 in the series.
http://www.communities.hp.com/online/blogs/managementsoftware/archive/2009/09/25/event-correlation-omi-tbec-and-problem-isolation-what-s-the-difference-part-3-of-3.aspx


For HP Operations Center, Jon Haworth.

Automated Infrastructure Discovery - Extreme Makeover

Good Discovery Can Uncover Hidden Secrets
Infrastructure discovery has something of a bad reputation in some quarters. We've done some recent surveys of companies utilizing a variety of vendors’ IT operations products. What's interesting is that, in our survey results, automated infrastructure discovery fared pretty badly in terms of the support that it received within organizations - and also in terms of the success that they believed they had achieved.
 
There are a number of reasons underlying these survey results. Technology issues and organizational challenges were highlighted in our survey. But I believe that one of the main 'issues' that discovery has is that people have lost sight of its basic values and the benefits that they can bring. Organizations see 'wide reaching' discovery initiatives as complex to implement and maintain - and they do not see compelling short term benefits.
 
I got to thinking about discovery and the path that it has taken over the last 15 or 20 years. I remember the excitement when HP released its first cut of Network Node Manager. It included discovery that showed people things about their networks that they just did not know. There were always surprises when we took NNM into new sites to demonstrate it. Apart from showing folks what was actually connected to the network, NNM also showed how the network was structured, the topology.
 
Visualization --> Association --> Correlation
And once people can see and visualize those two sets of information they start to make associations about how events detected in the network relate to each other - they use the discovery information to optimize their ability to operate the network infrastructure.
 
So the next logical evolution for tools like NNM was to start building some of the analysis into the software as 'correlation'. For example the ability to determine that the 51 "node down" events you just received are actually just one "router down' event and 50 symptoms generated by the nodes that are 'behind' the router in the network topology. Network operators could ignore the 'noise' and focus on the events that were likely causes of outages. Pretty simple stuff (in principle) but very effective at optimizing operational activities.
 
Scroll forward 15 years. Discovery technologies now extend across most aspects of infrastructure and the use cases are much more varied. Certainly inventory maintenance is a key motivator for many organizations - both software and hardware discovery play important roles in supporting asset tracking and license compliance activities. Not hugely exciting for most Operational Management teams.
 
Moving Towards Service Impact Analysis
Service impact analysis is a more significant capability for Operations Management teams and is a goal that many organizations are chasing. Use discovery to find all my infrastructure components - network devices, servers, application and database instances - and tie them together so I can see how my Business Services are using the infrastructure. Then, when I detect an event on a network device or database I can understand which Business Services might be impacted and I can prioritize my operational resources and activities. Some organizations are doing this quite successfully and getting significant benefits in streamlining their operational management activities and aligning them with the priorities of the business.
 
But there is one benefit of discovery which seems to have been left by the side of the road. The network discovery example I started with provides a good reference. Once you know what is 'out there' and how it is connected together then you can use that topology information to understand how failures in one part of the infrastructure can cause 'ghost events' - symptom events' - to be generated by infrastructure components which rely in some way on the errant component. When you get 5 events from a variety of components - storage, database, email server, network devices - then if you know how those components are 'connected' you can relate the events together and determine which are symptoms and which is the likely cause.
 
Optimizing the Operations Bridge
Now, to be fair, many organizations understand that this is important in optimizing their operational management activities. In our survey, we found that many companies deploy skilled people with extensive knowledge of the infrastructure into the first level operations bridge to help make sense of the event stream - try to work out which events to work on and which are dead ends. But it's expensive to do this - and not entirely effective. Operations still end up wasting effort by chasing symptoms before they deal with the actual cause event. Inevitably this increases mean time to repair, increases operational costs and degrades the quality of service delivered to the business.
 
So where is the automation? We added correlation to network monitoring solutions years ago to help do exactly this stuff, why not do 'infrastructure wide' correlation'?
 
Well, it's a more complex problem to solve of course. And there is also the problem that many (most?) organizations just do not have comprehensive discovery across all of their infrastructure. Or if they do have good coverage it's from a variety of tools so it's not in one place where all of the inter-component relationships can be analyzed.
 
Topology Based Event Correlation - Automate Human Judgment
This is exactly the problem which we've been solving with our Topology Based Event Correlation (TBEC)  technology. Back to basics - although the developers would not thank me for saying that, as it's a complex technology. Take events from a variety of sources, do some clever stuff to map them to the discovered components in the discovery database (discovered using a number of discrete tools) and then use the relationships between the discovered components to automatically do what human operators are trying to do manually - indicate the cause event.
 
Doing this stuff automatically for network events made sense 15 years ago, doing it across the complexity of an entire infrastructure makes even more sense today. It eliminates false starts and wasted effort.
 
This is a 'quick win' for Operational Management teams. Improved efficiency, reduced operational costs, free up senior staff to work on other activities… better value delivered to the business (and of course huge pay raises for the Operations Manager).
 
So what do you need to enable TBEC to help streamline your operations. Well, you need events from infrastructure monitoring tools - and most organizations have more than enough of those. But you also need infrastructure discovery information - the more the better.
 
Maybe infrastructure discovery needs a makeover.

 

For HP Operations Center, Jon Haworth


 

Search
Showing results for 
Search instead for 
Do you mean 
HP Blog

HP Software Solutions Blog

Featured


Follow Us
Labels
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.