Infrastructure Management Software Blog

Live in Ohio or Michigan? Coming next week - talk face-to-face with HP Product Managers about HP Software!

Just wanted to remind people who work/live in Ohio and Michigan about the upcoming Vivit roadshow around HP BTO software.  This is your chance to talk face-to-face with more technical people about BSM products like OpenView/Operations Manager, Real User Monitor, and Network Node Manager – and it won’t require any travel budget.  There are six locations, one of which is probably very close to you. 


Here is the tentative agenda for each day:


- Overview of Operations Center, Business Availability Center, and Network Management Center product portfolios.


- Demo of how all of these products work together


- Deep dive into Operations Center products and how best to leverage this software in your environment


- Deep dive into Business Availability Center products and how best to leverage this software in your environment


- Deep dive into NNMi and how best to leverage this software in your environment


The sessions will be held in the following states on the following days:


- (Columbus) Ohio – April 20, 2010


- (Orrville) Ohio – April 21, 2010


- (Dearborn) Michigan – April 22, 2010 


Feel free to contact me if you have any further questions about this roadshow at asksonja@hp.com.

Event Correlation: OMi TBEC and Problem Isolation - What's the difference (part 3 of 3)

If you have not done so already, you may want to start with part 1 in this series.
http://www.communities.hp.com/online/blogs/managementsoftware/archive/2009/09/25/event-correlation-omi-tbec-and-problem-isolation-what-s-the-difference-part-1-of-3.aspx


Read part 2 in the series.
http://www.communities.hp.com/online/blogs/managementsoftware/archive/2009/09/25/event-correlation-omi-tbec-and-problem-isolation-what-s-the-difference-part-2-of-3.aspx



This is the final part in my 3 post discussion of the event correlation technologies within OMi Topology Based Event Correlation (TBEC) and Problem Isolation. I've been focusing on talking about how TBEC is used and how it helps IT Operations Management staff be more effective and efficient.


In my last post I started to mention why End User Monitoring (EUM) technologies are important - because they are able to monitor business applications from an end user perspective. EUM technologies can detect issues which Infrastructure monitoring might miss.


 


In the example we worked through in the last post I mentioned how EUM can detect a response time issue and alert staff that they need to expedite the investigation of an ongoing incident. This is also where Problem Isolation helps. PI provides the most effective means to gather all of the information that we have regarding possible causes of the response time issue and analyze the most likely cause.


 


For example: Our web based ordering system had eight load balanced web servers connected to the internet. These are where our customers connect. The web server farm communicates back to application, database and email servers on the intranet and the overall system allows customers to search and browse available products, place an order and receive email confirmations on order confirmation and shipping status.


 


The event monitoring system includes monitoring of all of the components. We also have EUM probes in place running test transactions and evaluating response time and availability. The systems are all busy but not overloaded - so we are not seeing any performance alerts from the event monitoring system.


 


A problem arises with two of our eight web servers, and they drop out of the load balanced farm. The operations bridge can see that the problem has happened as they receive events indicating the web server issues. TBEC shows that there are two separate issues, so this is not a cascading failure – and the operations staff can see that these web servers are part of the online ordering service.


 


However, they also know that the web servers are part of redundant infrastructure and there should be plenty of spare capacity in the six remaining load balanced web servers. As they have no other events relating to the online ordering service, they decide to leave the web server issues for a little while as they are busy dealing with some database problems for another business service.


 


The entire transaction load that would normally be spread across eight web servers is now focused on the remaining six. They were already busy but now are being pushed even harder, not enough to cause CPU utilization alerts but enough to increase the time that it takes them to process their component of the customer’s online ordering transactions. As a result, response time, as seen by customers, is terrible. The Operations Bridge are unaware as they see no performance alerts form the event management system.


 


EUM is our backstop here; it will detect the response time issue and raise an alert. This alert – indicating that the response time for the online ordering application is unacceptable – is sent to the Operations Bridge.


 


The Operations Bridge team now know that they need to re-prioritize resources to investigate an ongoing business service impacting issue. And they need to do this as quickly as possible. They need to gather all available information about the affected business service and try to understand why response time has suddenly become unacceptable. This is where Problem Isolation helps.


 


PI works to correlate more than just events. It will pull together data from multiple sources - performance history (resource utilizations), events, even help-desk incidents that have been logged and work to determine the likely issue.


 


So we've come full circle. I spent a lot of time talking about OMi, and events and how an Operations Bridge is assisted by TBEC. But it's not the one and only tool that you need in your bag. Technologies like EUM and PI help catch and diagnose all of the stuff that just cannot be detected by 'simply' )I use that term lightly) monitoring infrastructure.


 


Once again if you want to understand PI better I encourage you to take a look at the posts by Michael Procopio over on the BAC blog.



For HP Operations Center, Jon Haworth.

Making the Best Use of the Tools You Have

I spent the day today at our executive briefing center with a customer that provides online spend management services. They have a number of our data center products, so we spent most of the day discussing integration. The agenda was simple. They wanted to learn how to:




  1. Make the best use of the tools they have


  2. Determine where they should be looking next

First, let’s examine their environment, which in turn drives their requirements.


Hardware
They have a variety of hardware, some HP servers and storage and some from other vendors. Some of this hardware is at the end of its lifecycle and in line for replacement. They want to dramatically reduce their power consumption with the replacement servers. Part of the savings will come from consolidation onto fewer, more powerful machines; part from newer hardware that is more energy efficient. We did not discuss replacement hardware explicitly today, but one topic of concern was that their current infrastructure management software must have the flexibility to manage future hardware purchases.


Virtualization
They are very interested in moving aggressively towards virtualizing most of their IT infrastructure. They want to be able to manage both physical and virtual servers and storage using a single set of instrumentation.


Software
Service Desk. They are using most of the modules within HP Service Manager. This allows them to manage their help desk efficiently and track changes throughout the enterprise.
Configuration Management Database. They have HP’s uCMDB (“u” for universal), which manages all the configuration items (CI) within their enterprise. In an effort to streamline their operations, they also purchased our Discovery and Dependency Mapping (DDM) software to automatically discover their IT infrastructure and populate the CMDB. The CMDB is the foundation layer that ties together all the components within our Business Technology Optimization suite. In addition to maintaining state and configuration information about individual CIs, it understands relationships among them, and how these align with business services.
Network Management. They use an open source network management software.
End-User Monitoring. They use a commercial product (non-HP). They purchased it last year to replace home-grown scripts. It runs synthetic scripts (similar to our End User Management software)
Operations Manager. They just purchased Operations Manager, along with agents, but have not yet deployed it. The prospect of consolidating several existing management consoles was one of the main reasons driving the purchase.


Presenting a Complete View of the IT Infrastructure
We started with the usual slide presentations that showed all the nice relationships among the products. Of course, heads nodded in agreement when we mentioned self-inflicted IT problems, the finger-pointing among groups during troubleshooting, and the challenge of seeing everything through a single console.


The key problem emerged that they lack a holistic view of the entire environment. Fortunately, once they deploy Operations Manager, this will solve the problem. It provides a “single pane of glass” in which they can view events from across their entire infrastructure, including the non-HP servers, non-HP network management, and non-HP user monitoring, in addition to all their HP hardware.


Generate (Enriched) Service Tickets from Events
And, Operations Manager can automatically open tickets in Service Manager. In addition to opening tickets based on events, Operations Manager enriches the events with all the relevant information from the CMDB including the affected business service. Once the incident is closed, either manually or automatically, Operations Manager will clear the event in its console and then tell Service Manager to close the ticket .That wrapped up the section on making the most of what they already have.


Automation Cuts Costs
Then, things got really interesting when we went to the white board. We outlined how much money they can save by implementing Operations Orchestration, our runbook automation solution, to automate some of the routine actions an operator would perform using Operations Manager. We used an example of another customer who saved $400K per year just by automating a database fix that takes only one minute to fix. That problem occurs 400 thousand times per year. At $1 per minute for support costs, do the math.


This paints a clear picture of where they should be looking next. And, all the discussions were based on released technology that is available to anyone today.


Let us know how you are making the best use of the tools you have. We’ll give you some expert guidance about what steps to take next that will further increase the return on your investment in infrastructure management software.


For Operations Center, Peter Spielvogel.

Search
Showing results for 
Search instead for 
Do you mean 
Follow Us


HP Blog

HP Software Solutions Blog

Labels
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation