Business Service Management (BAC/BSM/APM/NNM)
More than everything monitoring, BSM provides the means to determine how IT impacts the bottom line. Its purpose and main benefit is to ensure that IT Operations are able to reactively and proactively determine where they should be spending their time to best impact the business. This covers event management to solve immediate issues, resource allocation and through reporting performance based on the data of applications, infrastructure, networks and from third-party platforms. BSM includes powerful analytics that gives IT the means to prepare, predict and pinpoint by learning behavior and analyzing IT data forward and backwards in time using Big Data Analytics applied to IT Operations.

Operations: application performance sucks, what do I do now?

by Michael Procopio


In my IT days (it has been a while) as still happens today this is the question many have asked. It’s more complicated today, applications are more distributed now. However, you still have to go through the triage process. The topic these days is named “Application Performance Management” or APM.


APM has two parts, the traditional looking at infrastructure resources and measuring performance from the end user perspective.


You probably detected this problem in one of two ways. You are ahead of the curve and have end user monitoring in place or the user called the help desk to complain.


A typical web based application today uses a web server, application server and a backend, typically a database. Though the backend might actually have multiple parts if service oriented architecture, (SOA) is used. Good news, Operations Manager, agent based, and SiteScope, agentless, will provide status on condition of those servers.


These tools can also look at how packaged applications are doing on the server. Oracle, WebSphere, MS Exchange, MS Active Directory, to name a few, can be monitored by either Operations Agents or have SiteScope templates (a SiteScope template is a prepackaged set of monitors). These tools might point to something as detailed as database locks being far higher than normal and beyond the current setting on the database. A quick parameter change might fix this.


Next, we have the code. We hope this isn’t the case because this typically moves the problem from operations to development. However, Operations is still responsible for pinpointing the problem area. This is typically the domain of application support and in some organizations that’s inside Operations, in others a different group.


Here Business Transaction Management (BTM) tools can help. BTM manages from a transaction point of view. BTM includes transaction tracing. TransactionVision and Diagnostics work in a complimentary fashion to give you the next level of detail although each is usable separately. TransactionVision traces individual critical transactions (as you define them) through multiple servers; it gives you information on a specific transaction including the value of the transaction.


Diagnostics provides aggregate information on all transactions in a composite application giving you timing information. It can pinpoint:


· where time is spent in an application; either processing data or waiting for a response from another part of the application.


· the slowest layers.


· the slowest server requests which are the application entry points.


· outliers to help diagnose intermittent problems.


· threads that may be contributing to performance issues.


· memory problems and garbage collection issues.


· the fastest growing and largest size collections.


· leaking objects, object growth trends, object instance counts, and the byte size for objects.


· the slowest SQL query and report query information.


· exception counts and trace information which often go undetected.


TransactionVision and Diagnostics also integrate with Business Availability Center, which means you can start with a topology view and drill all the way down to find the status of the most valuable transaction running through you systems.


You can manage what you can’t measure. So what do I do now? If you are properly instrumented the problem will show itself. If you don’t find something you can fix, you can tell the app developers where they need to look to fix the problem.


   


Related Items:


· End User Monitoring


· Operations Manager


· SiteScope


· SiteScope Administrator Forum


· TransactionVision


· Diagnostics


· Business Availability Center





BSM at HP Software Universe

by Michael Procopio


 


HP Software Universe is next week, 16-18 June, in Las Vegas. Business Service Management (BSM) will be well represented.


In the Business Transaction Management area there are 13 sessions. Most of them are lead by customers. The sessions are listed below.


In the Network Management track Aruna Ravichandran is speaking in three sessions, you can see information on those at her post HPSoftware Universe/HP Technology Forum (HPTF) - Network Management sessions. The rest of the track is listed in the post Network Management at HP Software Universe.


Amy Feldman, Dennis Corning and Peter Spielvogel the ITOps bloggers has covered a number of the sessions in the Consolidated event and performance management. Here are a list of the posts:



 


Business Transaction Management Track













































































































Session ID Title Presenting company
1114 Confessions of a product manager: get the real scoop on the latest HP Business Availability Center HP
1165 The MITRE Corporation: higher operational effectiveness at lower cost through automated alert management MITRE Corporation, AlarmPoint
1233 Key decisions and practical techniques in configuring business transaction management
1236 Real User Management: know how your TCP/IP applications perform for your users HP
1267 Using HP Business Availability Center to analyze and triage application and infrastructure anomalies and problems BCBS of Florida
1303 Sodexo: partnering with HP Software-as-a-Service to ensure critical e-business application performance and availability Sodexo
1342 Wrigley: HP Business Availability Center deployed on Software-as-a-Service yields big improvements in IT monitoring without increasing staff Wrigley
1360 Lockheed Martin: deploying HP Business Availability Center in a virtual environment and forwarding alerts through an iPhone Twitter-based application Lockheed Martin
1363 DIRECTV: an HP Business Availability Center and HP operations implementation DIRECTV
1401 Liberty Life: taking the fast track to implementing HP Business Availability Center and gaining business value in 6 months Liberty Life
1425 Sentara Healthcare: improving the availability of critical business services and fixing IT problems before they impact customers Sentara Healthcare
1436 Lockheed Martin: practical advice for configuring and operating HP End User Management solutions Lockheed Martin
1452 Vale: deploying HP Business Availability Center solutions to monitor applications and systems and to help ensure availability and performance Vale

 You can get the details of all the BSM sessions at the HP Software Universe Track Session Catalog.


I hope to see you there, but if you can’t make it we will be doing follow-up posts. You can also follow on Twitter, the hashtag is #HPSU09. There are already a number of Tweets and the show hasn’t started yet. The Twitter account for the show is HPSU09, if you’d like to follow us. Or visit the HP Software Universe Facebook page.


For the Business Availability Center, Michael Procopio


 

BSM Evolution: The CIO/Ops Perception Gap

 


There are many potential culprits for why IT organizations struggle to make substantive progress in evolving their ITSM/BSM effectiveness. A customer research project we did a few years ago offered an interesting insight into one particular issue that I rarely see the industry address. The research showed that most CIO’s simply had a different perception – when compared to their IT operations managers- of their IT organization’s fundamental service delivery maturity and capability. This seemingly benign situation often proved to be a powerful success inhibitor.


 


The Gap:


A substantial sample size of international, Global 2000 enterprise IT executives participated in the study. When asked to prioritize investment priorities on a broad range of IT capabilities, we saw a definite gap. IT Operations managers consistently ranked, “Investing to improve general IT service support and production IT operations” in their top 1 or 2 priorities, where CIO’s ranked this same capability much lower as a priority 6 or 7.


 


The Perception:


When pressed further, CIOs believed that the IT service management basics of process and technology were already successfully completed, and the CIO’s had mentally moved on to other priorities such as rolling out new applications, IT financial management, or project and portfolio management.


 


Most of the CIOs in the study could clearly recall spending thousands of dollars sending IT personnel to ITIL education, and thousands more purchasing helpdesk, network, and system management software. Apparently, these CIO’s thought of their investment in service operations as a onetime project, rather than an ongoing journey that requires multiple years of investment, evolution, reevaluation, and continuous improvement.


 


IT operations managers on the other hand- clearly had a different view of the world. They were generally pleased with the initial progress from the service operations investments, but realized they were far from the desired end state. The Ops managers could plainly see the need to get proactive, to execute advanced IT processes and more sophisticated management tools, but could not drain the proverbial swamp while fighting off the alligators.


 


The Trap:


We probed deeper in the research, diligently questioning the IT operations managers on why they didn’t dispel the CIO’s inaccurate perception. In order to secure the substantial budget, these Ops managers had fallen into the trap of over-promising the initial service management project’s end-state, ROI and time to value. (I wouldn’t be surprised if they had been helped along by the process consultants and software management vendors!)


 


These Ops managers saw it as “a personal failure” to re-approach the CIO and ask for additional budget to continue improving the IT fundamentals. Worse yet, they had to continually reinforce the benefits from the original investment so the CIO didn’t think they had wasted the money. So, the IT operations staff enjoyed the result of reactively working nights and weekends to meet business’ expectations, and make sure everyone kept their jobs. Meanwhile, the CIO’s slept well at night thinking, “Hey, we are doing a pretty darn good job”, but faced the next day asking, “Why are my people burnt out?” A vicious cycle.


 


Recommendation through Observation:


Im not wild about making recommendations since I merely research this stuff… not actually perform hands-on implementation. Instead, I will offer some observations of best practices from companies who appear to be breaking through on BSM, lowering costs, raising efficiency and improving IT quality of service.


 



  1. Focus on Fundamentals: It is boring and basic, but absolutely critical to continually look for ways to improve the foundational service management elements of event, incident, problem, change, and configuration management. Successful IT organizations naturally assume that if they implemented these core processes more than 3 years ago, they likely need to update both technology and process. If FIFA World Cup Football clubs and Major League Baseball teams revisit their fundamental skills each and every year, why wouldn’t IT?

 



  1. Assume a Journey: IT leaders who develop a step-wise, modular path of realistic projects that deliver a defined ROI at each step have the best track record of securing ongoing funding from the business. The danger here is defining modular steps that are so disconnected and silo’d, that IT never progresses toward an integrated BSM/ITSM process and technology architecture. This balance continues to be one of the most difficult to manage.

 



  1. Empowered VP of IT Operations: The advantages of a CIO empowering a VP of IT operations and holding them accountable for end-to-end business service has been discussed in previous posts. The practice of having a strong VP of operations who has executive focus on service operations and continual service improvement, while having end-to-end service performance responsibility does appear to be a growing trend and success factor.

 



  1. Focus on the Applications: In the same research study that showed the perception gap on, “Investing to improve general IT service support and production IT operations”, there was consistent agreement on, “Investing to improve business critical application performance and availability”. The CIO’s, Ops Managers and Business Relationship managers all ranked this capability as a top 1 or 2 priority.

 


Successful BSM implementations focus on the fundamentals of process and infrastructure management, but do so from a business service, or an application perspective. This approach not only enables an advantageous budget discussion with the business, but it also hones the scope and execution of projects.


 


It is difficult to assess the relative impact of this CIO/IT Ops perception gap, considering the wide variety of challenges that IT faces. But hopefully, this post gives you something to consider when assessing your own IT organization’s situation and evolution.


 


Let us know where your organization fits – please take our two question survey (two demographics questions also). We’ll publish the results on the blog.


 



  • Describe the perception of your IT's fundamental service delivery process

  • How often does your IT organization significantly evaluate and invest to update your fundamental IT process

 


Click Here to take survey


 


Bryan Dean – BSM Research

OpEx versus CapEx

Forrester just posted on how the recession is hitting capital budgets (CapEx) and that you should consider using operating expenses (OpEx) to purchase software (http://www.idc.com/getdoc.jsp?containerId=lcUS21765009).


About 18 months ago, we introduced one year term licenses on the Business Availability Center (application and business transaction management) software so that it is more likely to fit within OpEx budgets.


Mike Shaw.

BSM customer evolution paths: Samples and observations

When developing and marketing products, we often have questions  which can only be answered by going out there and seeing what people are doing. We have a guy on the BSM team who does this for us. His name is Bryan Dean. I've worked with Bryan for many years and I've always been impressed by his objectivity and the insight he brings to his analysis (i.e. he doesn't just present a set of figures - he gets behind the figures).


 


At the end of last year, we asked Bryan to analyze the top 20-odd BSM deals of 2008. He formed a number of conclusions from this research. One set of conclusions concerned how people "get to BSM" - how they evolve towards an integrated BSM solution. I asked Bryan to help me with a series of posts to share what he learnt about evolutions towards BSM because I think that knowing what our other BSM customers are doing may help you.


 


________


 


Mike: Bryan, can you give a summary of what you learnt?


Bryan: There is no one evolution path. It's fascinating to me that a hundred different IT organizations can have virtually the same high-level goals, fundamentally agree on the key factors for success, and yet end up with a hundred unique execution paths.


 


Before I answer your question, can I create a definition? The term "BSM" is very poorly defined within the IT industry - different vendors have different versions, and so do the industry analysts (in fact, some other research I did last year concluded that very few people had a clear idea of what BSM means).  So, I'd like to introduce the term "Automated Business/IT Service Management"  or AB/ITSM.


 


Back to your question, I think I can group all the different evolution paths into five key types:  




  1. ITSM incident, problem change & configuration:  this evolution is driven out of the need for process-driven IT service management with the service desk as a key component


  2. Consolidated infrastructure event, performance and availability: this is driven by a recognition that having a whole ton of event management and performance monitoring systems is not an efficient way to run IT, and so there is a drive to consolidate them into one console.


  3. Business service visibility & accountability:  this is more of a top-down approach - start with monitoring the customer's quality of experience and then figure out what needs to happen underneath. This is popular in industries where the "web customer experience" is everything - if it's not good, you lose your business


  4. Service discovery & model: this is where evolution towards integration is driven from the need for a central model (the CMDB). Often, the main driver for such a central model is the need to control change


  5. Business transaction management: today, this is the rarest starting point. It's driven by a need to monitor and diagnose complex composite transactions. We see this need most strongly in the financial services sector

Mike: How about the politics of such AB/ITSM projects?  (I don't see the AB/ITSM term taking hold, by the way :-) )


Bryan: Politics (or, most specifically, the motivational side) is important. I think many heavy thinkers in our industry have the mistaken assumption that that there is a single evolution path, controlled from the top on down by the CIO following a master plan. Trying to manage such a serialized, mega project is a huge challenge and too slow, not to mention that 99% of CIO’s are not in the habit of forcing tactical execution edicts on their lieutenants (I know I’ll get some argument on that one :-) ).


 


What I see from my research is that the most successful IT organizations are those who have figured out how to balance between discrete doable projects, and an overall AB/ITSM end-goal context and roadmap.  Typically, the CIO lays down a high-level vision that ties to specific business results, and then allows key lieutenants to assess and drive a prioritized set of federated, manageable projects that independently drive incremental ROI. Some IT organizations may have a well-defined integrated roadmap, but the majority of IT run federated projects in a fairly disjointed fashion.


 


These parallel paths are owned by many independent personas within IT, each trying to solve the specific set of issues at hand. For them, being bogged down in how their federated project aligns and integrates with all the other AB/ITSM projects is daunting… if not fatal.


 


And on reflection this makes sense to me - the human side of things plays a large role in such endeavors.


 


Mike: What do you mean?


Bryan: IT organizations of all shapes and sizes have goals to reduce costs, increase efficiency, improve business/IT service quality, and mitigate risk all while applying technology in an agile way to boost business performance.   What I find interesting is how specific, funded initiatives are created by specific personas to achieve the goals.


 


In future posts, I will share some specific examples of how customers evolved through these paths, the key driver personas, the core motivations and how these paths come together.

There are a number of ways of populating the service dependency map

 


In a post two weeks  on this blog, I listed all the ways that we use service dependency maps (model-based event correlation, service impact analysis, top-down performance problem isolation, SLAs, etc).  What can be used to discover service dependency information?


 


OperationsCenter Smart Plug-ins (SPIs) now discover to the CMDB


If you're using the agent-based side of OperationsCenter (OpC), then each managed node will have an agent on it. You can put a smart plug-in (SPI) onto that agent. SPIs have specialized knowledge of the domain they are managing. There are many SPIs for all kinds of things from infrastructure up to applications like SAP. Many of the SPIs discover (and continue to discover) the environment they are monitoring. This is agent-based discovery using all the credentials you've already configured into the OpC agent.




The OMi team are working on putting SPI-based discovery information into the HP CMDB (the Universal CMDB or uCMDB).


 


Agentless monitoring populates the uCMDB


If you have agentless monitoring (HP SiteScope) this will populate the uCMDB too (as of SiteScope version 10).




Whatever SiteScope monitors you have configured will send their configuration information to the uCMDB. So, if you're monitoring a server with a database on it, all the information about the server and its database will be sent to the uCDMB.


 


Network Node Manager populates the uCMDB


As of the latest version of Network Node Manager (NNMi 8.10), discovered network end-points are also put into the uCMDB. "Network end-points" are anything with a network terminator - network devices, servers, and printers. NNMi provides no service dependency information, but it does provide an inventory of what's out there.




This inventory discovery is useful for rouge device investigation - noticing an unknown device, creating a ticket to the group responsible for that type of device so they can look into it.


 


Standard Discovery


Our Standard Dependency Discovery Mapping product (DDM-Standard) will discover your hosts for you. This also discovers network artifacts (but, see NNM discovery above - if you have NNMi, this is a more detailed network discovery mechanism).


 


Advanced Discovery


Advanced Dependency Discovery Mapping will discover storage, mainframes, virtualized environments, LDAP, MS Active Directory, DNS, FTP, MQSeries buses, app servers, databases, Citrix, MS Exchange, SAP, Siebel, and Oracle Financials.




You can also create patterns for top -level business services and DDM-Advanced will discover those too.


 


Transaction Discovery


Our Business Transaction Management product, TransactionVision,  deploys sensors to capture application events (not operational events) from the application and middleware tiers. These sensors feed the events to the TransactionVision Analyzer which automatically correlates these events into an instance of a transaction. TransactionVision also classifies the transactions by type - bond trade, transfer request, etc. Thus, TransactionVision is discovering transactions for you.




TransactionVision puts this transaction information into the CMDB. In other words, the CMDB doesn't just know about "single node" CI types like servers, it also knows about flow CI types - transactions.




Also, if the CMDB notices that the transaction flows over a J2EE application, it links the transaction to information in the CMDB about this J2EE application - the transaction step and the J2EE app are now linked in the model. .


__________


 


By the way, my colleague Jon Haworth has just posted on the value of discovery in the realm of Operations Management at ITOpsBlog (28th January, "Automated Infrastructure Discovery - Extreme Makeover").

Answers to questions on "what's new in Business Availablity Center 8.0?"


I recently mentioned about a whats new webinar conducted on BAC v 8.0. You can now access this on-demand webinar at


https://h30406.www3.hp.com/campaigns/2008/events/sw-01-20-09/index.php?rtc=3-2CDASIY


 


Here are some of the questions which came up during the live webinar.


 


Q: When will 8.0 be available?


A: The 8.0 release will be made available the first week of February


 


Q: How will the new modeling changes affect my current custom views?


A: There are no more instance views, it’s just views and custom perspectives that provide the content in the view, the upgrade for most customers should be straightforward, unless they have changed the model, created new CI types with custom links or are heavily using pattern views with impact analysis, correlation rules and alerts.


 


Q: How about integration with HP Operations Manager? Can we leverage our current HPOV infrastructure monitoring capabilities and marry data with BAC application monitoring?


A: Yes with HP problem Isolation we have support of OM (Operations Manager) through event correlation to application problem/ anomaly start time.


 


Q: Does v 8.0 support oracle 11g platform


A: Yes with v 8.0 is it supported.


 


Q: I was told that the DDM portion of the new 8.0 can discover WebLogic 10.x iis it true?


A: Yes with v 8.0 is it supported.

Can I get away without using discovery?

When I was at our European HP Software event before Christmas / The Holidays, I spent a good deal of time talking to people about our new product releases and the future of BSM. One customer looked a little worried and said, "wow - you seem to rely on discovery a lot".


I guess there are two things to say in answer to that observation. The first is "yes - because we rely on the service hierarchy model a lot". And the second is, "but there are a number of different types of discovery - and a number of them you already have".


 


So, in a two part post, I thought I'd answer that observation more comprehensively. So, let's first look at how we use inventory and service hierarchy information in the management of service health (and thanks Jon Haworth from the OperationsManager team for his significant help on this post):


 



  • It helps with administering the monitoring deployment of the managed environment. It tells us what is out there, what we need to manage, what has disappeared, and so on. This only requires discovery of the infrastructure inventory – "tell me what servers exist" (unless everything is virtualized, in which case it needs a lot more. The OperationsCenter team has posted on the new virtualization SPI recently at ITOpsBlog. This SPI discovers, and more importantly, continues to discover, virtualized environments).

 



  • It helps OMi to understanding the stream of events which are being detected in the infrastructure and applications. The hierarchy of the monitored items ("configuration items" or CI's) allows OMi to tell us which events are causal events and which are symptoms – what do we need to work on and what can we ignore. I talked about how OMi does this in a post last year.

 



  •  It allows all parts of the BSM stack perform service impact analysis. This is where events are related to infrastructure and applications and their impact or potential impact on the services above in the hierarchy is established. We can then use this impact information to prioritize the events.  Service impact analysis requires a model of the hierarchy of CI's and services.  Maintaining the service hierarchy manually is untenable -- things just change too rapidly for humans to keep up.  

 



  • When a disk has a set of read/write errors, is that catastrophic? If it's a single disk, then yes - the infrastructure element is in trouble. If it's part of a RAID array, then no -  provided the rest of the array is OK.  If we know the type of CI that we seeing events against, we can make better decisions about its true health.

 


This is also a new feature in OMi: when CI's are discovered we know their type. OMi ships with a database of health indicators for each CI type. For example, for single disks, it's a problem if the disk gets bad errors; if it's a RAID array, then provided a high percentage of the other disks are OK, this is not a serious problem; and so on.


 


This feature makes the calculation of the true health of CI much easier. You don't need to define a set of propagation rules. OMi uses the discovered CI type information and it's lookup table to figure out propagation itself.


 


This all ties into a new feature in OMi called "Health Indicators". Jon Haworth has promised to post on this on his team's blog at the OperationsCenter blog


 



  • Our top-down performance Problem Isolation software needs to understand the service hierarchy on which the end user application rests. For example, if I have a web user interface, I need to understand what services that user interface depends on. As I discussed in a post last year, problem isolation uses statistical correlation analysis to suggest the likely cause of such top-down performance problems.

 



  • We need the service hierarchy for defining SLAs. I may define a compound SLA that depends on a number of OLAs and a top-level measured SLA. The modeling user interface for this and the subsequent off-line SLA calculation is done based on the service hierarchy.

 


In the second part of the post, I'll talk about all the things that now populate the host inventory and service dependency map.  Hint: if you have SPIs, you'll like what we have to say :-)


 


Mike Shaw

How do I learn more about BAC 8.0?

My last post was about the new BAC 8.0. I've had a couple of queries as a result of that post asking if there is more information on BAC 8.0.

A couple of my colleagues are going to be giving a webinar on BAC 8.0 on the 20th Jan. For more details and to register, please go to.... 

https://h30406.www3.hp.com/campaigns/2008/events/sw-01-20-09/index.php?rtc=3-2CDASIY 


 


Mike Shaw.

Announcing Business Availability Center 8.0

In a post last year, I talked about how to move from user experience monitoring to user experience management, you need to be able to figure out what is the cause of a measured user experience problem, like slow on-line  check-in times.  I talked about a tool we have called Problem Isolation that helps do to this figuring out.


Up till now, Problem Isolation has used just performance data measured by our agentless probes (from a product called SiteScope) in order to correlate between a top-line performance metric (like online check-in times) and the health of services that top-line metric depends on (database, app server, integration bus, etc). But there is another source of data we haven't included until now -- the events collected by our operations product, HP Operations Manager. If you have HP Operations Manager, you have a massive source of information that can also be used to determine where top-down performance problems lie. 


 


This is how Problem Isolation now uses HP Operations Manager data:


 



  1. A business service problem is identified. For example, thru synthetic or real-user monitoring we determine that online check-ins are running too slowly

  2. A “time buffer” around the problem start time is determined

  3. The model for the business service in the CMDB is traversed, returning a list of all services supporting the business service

  4. Events that occurred within the above-mentioned time-buffer relating to those supporting services are determined

  5. The services with the best-correlating events (taking into account severity as well) are identified as likely suspects

 


This algorithm applies to any event, whether it’s from a third party enterprise management system (e.g. Tivoli), from HP Operations Manager, or,  from HP Network Node Manager.


 


-------------


 


In our quest to move from service health monitoring to service health management, we're trying to provide as much information  relating to a problem/incident as possible - all in one place in such a way that the information is easily visualizable.


 


In BAC 8.0, you can see the following regarding a problem service, all from one place:


 



  • The current performance of the service

  • The performance of the service over time

  • SLAs resting on the service and their closeness to jeopardy

  • Business processes using that service and the impact of the problem on those business processes. If you have our Business Transaction Management modules of BAC, you can see exactly which business process instances are affected or at risk. In industries like financial services this matters because the value of transactions can vary hugely, and business operations wants to know which important business instances are affected (e.g. A $10m inter-bank transfer) so that they can initiate manual work-arounds

  • Measured user experiences resting on this service. Imagine an app server is having a problem. You can "look upwards" and see that this app server is used to serve the online check-in business service. You can then see the measured impact of the app server problem on the online check-in user experience. This would be measured using either synthetic or real-user monitoring

  • Real changes that have occurred under the problem service. The real changes are inferred by the discovery technology that notices deltas between the state of CIs today versus yesterday

  • Planned changes against the problem service as taken from the change/release management system

  • Outstanding incidents against the problem service. You can "look across" to the details of the incidents to see if they provide the app support team with any insight into how to solve the problem

  • Non-compliancy state of servers under the problem service. Our server automation technology now updates server compliance state into the CMDB and this can be viewed in this 360 degree view of the problem service

 


------------


Mike Shaw.

One brand new product and two major enhancements to the BSM stack - Vienna HP Software Universe 2008

Today is the first day of our software user conference here in sunny Vienna, Austria. We just announced a brand new product, and two major upgrades.


I'll start with the new product ...


 


HP Operations Manager i (Part of HP Operations Center) is our next-generation consolidated event and performance management product following on from HP Operations Manager. Internally, we call it OMi. Three keys about OMi...


 



  • You can take events from anywhere into OMi because it sits directly on top of our CMDB which holds business transaction, user experience, application, middleware, and infrastructure information.

  • OMi does root event analysis using the discovered service dependency map held in the CMDB. This means that only root events are shown in the console and subsequent events caused by the root event are hidden.

  • OMi gives you more than simply an "event stream" view of the world. It can also give you a service health view of the services you are responsible for. The exact make-up of a service's health is up to you - it will obviously include availability and performance, but it can also include the number of open incidents, for example.

 


I'll write more about OMi in a post at the end of this week.


 


HP Business Availability Center 8.0  (BAC 8.0) for application management uses HP Labs patented statistical analysis to cut through the volume of performance and operations event data in order to help customers predict and proactively resolve business service performance problems before they impact end users.


 


I did a post recently on the difference between user experience monitoring and user experience management noting how important performance problem isolation was. The latest version of BAC 8.0 does such analysis using both performance information and the rich source of events that HP Operations Manager, our operations management software, can give you.


 


I'll post on BAC 8.0 in more detail next week.



HP Network Node Manager i Advanced : we released a brand new network management product, NNMi,  this time last year incorporating a clever root event analysis engine (now found in OM i) and new, much faster spiral network discovery engine. The new Advanced Edition of NNMi is targeted at large enterprises.


 


NNMi Advanced helps you predict the service impact of network degradation before business services are negatively effected through its integration with our CMDB.


 


And it natively uses our run-book automation technology, Operations Orchestration, to automatically collect data, fix problems and verify state once a fix has been actioned.


 


More on NNMi Advanced in the NNM blog. 

From User Experience Monitoring to User Experience Management

This post discusses the differences between monitoring user experience and actually managing user experience to a certain level.  The difference between the two lies mainly in our ability to figure out where user experience problems lie.


 


A recent EMA study found that 52% of user experience problems (problems with things like front-ends to web sites) were reported by customers. In other words, the majority of  user experience problems were found by customers and not by IT.  Not good - this is probably a statistic we might want to keep hidden from the business!


 


How do we get to a situation where we're not using our customers as very expensive monitoring devices, particularly in these days when each piece of  customer business is important?


 


We use "user experience monitoring" or EUM. There are two flavors or EUM - synthetic and real-user.



     


  1. Synthetic monitoring uses scripts to simulate customer activity. Let's imagine we want to ensure our online check-in web user interface performs well at all time. After all, if it doesn't, there is a good chance that customers will not choose our airline again. We record a script that retrieves a dummy booking, chooses a seat, and opts to print a boarding pass. We run this three step script every ten minutes from three different locations around the world. We can now proactively and automatically know when customers are having problems with our online check-in. And because we are monitoring from three different points, we can determine if it's the actual business service or something on the way to the business service that is causing any problems.
     

  2. Real user monitoring is like a network probe. It listens to http network traffic (or tcp/ip traffic) and builds up a picture of the performance of a web user interaction. We would use it to monitor our online check-in process. Real user monitoring gives us a very rich source of diagnostic information should there be a problem.

  3.  


Which should we use, synthetic or real-user monitoring? Both. Synthetic gives you the proactive notification - we have one customer who runs 30 different scripts at 7:30am every morning before its customers come on line so it can proactively fix problems before the customers notice. But real user monitoring gives a richer source of diagnostic information.


 


And that brings me to the title of this blog posting .. "from user experience monitoring to user experience management".


 


Everything I've described right now is all about telling you you have a problem before your irate customers ring in and tell you first. This is important, but it's not the whole story. Imagine I could tell you your car was having problems before you noticed.  If I told you of a problem, your first question would most certainly be, "and what's causing the problem?" 


 


And that's a very important question to ask. Forrester estimates that 80% of the time spent fixing a problem lies in finding where the problem is.  And the typical performance problem of the sort we're talking about ping-pongs between different expert groups as they try to determine whose fault it really is. Going back to your car, it would go to the electrics group first. They would say, "not us". Then to the transmission group. "Not us". And then to the fuel system group. "Not us". Then the wheels and tyres group. "OK - it's us. You need new tyres and a re-alignment".


 


And why does this allocation ping-pong occur? Because we don't give the appropriate tools to first-line support - to the Operations Bridge. What they need is a tool that allows them to figure what is causing a user experience performance problem.


 


We have such a tool (which probably doesn't surprise you - I probably wouldn't write about a problem we most certainly couldn't solve and had no plans to solve!) And this tool moves us from User Experience Monitoring to User Experience Management - to detecting a problem early and then fixing it quickly and efficiently. This tool does the following:



     


  1. It takes you step by step thru a workflow looking at different potential sources of problem cause. Once I've talked about the areas the tool considers, you'll see why a guided workflow is a helpful thing to have.
     

  2. It re-runs any synthetic scripts against the problem business service to see if the problem is still occurring
     

  3. It looks for patterns in the behaviour of the problem business service's performance. Does the problem occur weekly? Is it only from one location?
     

  4. It looks to see if a dependent service is causing the problem. These critical business services are complex beasts - they can depend on a lot of "stuff" underneath - and we always under-estimate just how much "stuff" there is under a business service. A colleague of mine was recently at a customer site. They were estimating how many IT artefacts were under their claims processing system. The consensus was "about 12 systems". They looked in the CMDB and found it was 42 systems, all the way down to network paths, DNS servers and the like.  Anyway - our tool looks at the performance of all these dependent services and uses some clever statistical analysis tools to determine the most likely suspect - the dependent service that most highly correlates with the business service performance problem.
     

  5. It looks to see what changes have occurred under the problem business service.  It gets this information from the discovered CMDB - discovery notices when a service has changed and flags this. There is an IDC statistic that says that if a problem has occurred under a problem service, there is a probably of 80% that the change will have caused the problem.
     

  6. It looks at incidents against this problem service in the service desk. This allows us to get the service desk's view of what is happening.

  7.  


These guided analyses allow us to get a much better idea of what is causing a user experience performance problem, thus allowing us to "cut into the 80%" - cutting into the most time-consuming part of solving a complex performance problem and stopping that inefficient "allocation ping-pong".


 

Search
Showing results for 
Search instead for 
Do you mean 
About the Author(s)
  • Doug is a subject matter expert for network and system performance management. With an engineering career spanning 25 years at HP, Doug has worked in R&D, support, and technical marketing positions, and is an ambassador for quality and the customer interest.
  • Dan is a subject matter expert for BSM now working in a Technical Product Marketing role. Dan began his career in R&D as a devloper, and team manger. He most recently came from the team that created and delivered engaging technical training to HP pre-sales and Partners on BSM products/solutions. Dan is the co-inventor of 6 patents.
  • This account is for guest bloggers. The blog post will identify the blogger.
  • Over 11 years of experience in design and development of NMS/EMS products and presently with the Device content support covering broad based features of multitude device vendors in NNMi.
  • Manoj Mohanan is a Software Engineer working in the HP OMi Management Packs team. Apart being a developer he also dons the role of an enabler, working with HP Software pre-sales and support teams providing technical assistance with OMi Management Packs. He has experience of more than 8 years in this product line.
  • HP Software BSM Social Media
  • Nimish Shelat is currently focused on Datacenter Automation and IT Process Automation solutions. Shelat strives to help customers, traditional IT and Cloud based IT, transform to Service Centric model. The scope of these solutions spans across server, database and middleware infrastructure. The solutions are optimized for tasks like provisioning, patching, compliance, remediation and processes like Self-healing Incidence Remediation and Rapid Service Fulfilment, Change Management and Disaster Recovery. Shelat has 21 years of experience in IT, 18 of these have been at HP spanning across networking, printing , storage and enterprise software businesses. Prior to his current role as a World-Wide Product Marketing Manager, Shelat has held positions as Software Sales Specialist, Product Manager, Business Strategist, Project Manager and Programmer Analyst. Shelat has a B.S in Computer Science. He has earned his MBA from University of California, Davis with a focus on Marketing and Finance.
  • Architect and User Experience expert with more than 10 years of experience in designing complex applications for all platforms. Currently in Operations Analytics - Big data and Analytics for IT organisations. Follow me on twitter @nuritps
  • 36-year HP employee that writes technical information for HP Software Customers.
  • Pranesh Ramachandran is a Software Engineer working in HP Software’s System Management & Virtualization Monitoring products’ team. He has experience of more than 7 years in this product line.
  • Ramkumar Devanathan (twitter: @rdevanathan) works in the IOM-Customer Assist Team (CAT) providing technical assistance to HP Software pre-sales and support teams with Operations Management products including vPV, SHO, VISPI. He has experience of more than 12 years in this product line, working in various roles ranging from developer to product architect.
  • Ron Koren is a subject matter expert for BSM / APM, currently in the Demo Solutions Group acting as a Senior Architect. Ron has over fourteen years of technology experience, and a proven track record in providing exceptional customer service. Ron began his career in R&D as a software engineer, and later as a team manager. Ron joined HP software in 2007 as an engineer in the Customer-Oriented R&D team. Prior to joining HP, Ron held a leadership development role at Israel’s largest bank. Ron holds a B.S. in Computer Science from The Interdisciplinary Center, Herzliya Israel.
  • Stefan Bergstein is chief architect for HP’s Operations Management & Systems Monitoring products, which are part HP’s business service management solution. His special research interests include virtualization, cloud and software as a service.
  • With 11 plus years of very broad experience as a deployment expert for all the NMC products, my deliverables includes helping the Sales and Pre-Sales team in sizing and architecting the solution and hardware, assisting the implementers in product deployment and helping the customers directly when the products are deployed in production setup. As part of Customer Assist Team, I participate in a lot of customer facing activities from R&D side and provides best practices of using HP SW NMC products for efficient network management and leverage my rich experience in Network Node Manager and related iSPIs products.
HP Blog

HP Software Solutions Blog

Featured


Follow Us
Labels
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.