Business Service Management (BAC/BSM/APM/NNM)
More than everything monitoring, BSM provides the means to determine how IT impacts the bottom line. Its purpose and main benefit is to ensure that IT Operations are able to reactively and proactively determine where they should be spending their time to best impact the business. This covers event management to solve immediate issues, resource allocation and through reporting performance based on the data of applications, infrastructure, networks and from third-party platforms. BSM includes powerful analytics that gives IT the means to prepare, predict and pinpoint by learning behavior and analyzing IT data forward and backwards in time using Big Data Analytics applied to IT Operations.

Don’t miss this discussion on storage management!

DiscoverBarc2014_emailsignature_loc_Image_blue_RGB_326X116.jpgJoin me in a panel discussion on the challenges and solutions that are faced in large storage environments at HP Discover Dec 2-4. We will also be covering our current release, Storage Essentials 9.70. At the event, we will also announce a rollup patch build that includes support for new items.

 

Guest post by Kara McMillan, HP Storage Essentials Product Manager

Get the latest in Business Service Management—in just 3 days!!

81408-pic-2000y2000.jpgI know that showcase events like HP Discover can often feel overwhelming. Let me remove the confusion and act as your tour guide to the Business Service Management sessions that you won’t want to miss at this year’s event in Barcelona.

 

Keep reading to follow me through the event!

Optimize your cloud and reclaim infrastructure capacity – today

DiscoverBarc2014_emailsignature_loc_Image_blue_RGB_326X116.jpgAs you move to cloud for agility, scalability and on-demand access to IT services, you face a serious challenge: How to optimize the capacity of your rapidly growing cloud. We are bringing together CSA and vPV, to do cloud optimization, which helps both the cloud administrator and the cloud consumer both deliver services and use those services in an optimized fashion.

 

Read on to learn how.

HP Storage Essentials capacity planning for thin provisioning

thick provisioning vs thin provisioning.pngAs customers aim to control the cost of the growing storage capacity demands, they move away from physical environments into virtual storage environments. As the demand for productivity continues to increase, hardware and resource budgets decrease.

 

This forces many organizations to re-evaluate their data center designs in the midst of fast accelerating data growth trends. With business critical data protection challenging storage teams, many IT decision makers direct their attention toward maximizing storage Return on Investment.

 

Keep reading to find out how thin provisioning, in a shared-storage environment, provides a method for optimizing utilization of available storage.

 

Guest post by Inge De Maere

Functional Architect for Storage Essentials

Next-generation network management at Robert Bosch GmbH

Balaji Venkatraman.jpgAre you struggling with network management? Tobias Huck is sharing the experience that Robert Bosch GmbH had as they recently went through the deployment process.

 

Keep reading to find out how to attend the session to learn how Tobias and his colleagues collaborated with the HP Software team to drive key product capabilities for the Bosch enterprise network.

 

Guest post by Balaji Venkatraman, Director of Business Service Management, HP Software

Learn how to manage globally distributed datacenter straight from DHL

dhl-express.jpgDHL runs a business of immense complexity, coordinating the activities of more than 285,000 employees in over 220 countries and territories globally. Now part of Deutsche Post DHL, it processes millions of transactions every day, and runs a fully 24x7 operation.

 

Timothy Tan, Head of Tools and System Management Services at DHL is sharing at HP Discover Barcelona 2014 about how his team standardized monitoring policies and process across three globally distributed datacenters using HP Operations Manager to streamline and centralize IT Operations Management.

 

Keep reading to find out what lessons they have to share and how you can attend the session yourself.

Unravelling the Mysteries of Stackable Switch Management – NNMi 10.00 Stackable Feature

pancakestack.jpg

 

 

The data center network is getting more complex. With this complexity, stackable switches are the workhorses of the next-generation virtualized data center.

 

Keep reading to find out how stackable switches offer a cost effective alternative to traditional network design.

NNMi HPLN page is now live

NNMi Live Network.pngWe have just launched the new Network Node Manager i software (NNMi) page on HP Live Network. Use this page to stay abreast of the latest HP Network Node Manager i Software (NNMi) announcements, download sites, product documentation, and videos…all in a one-stop-shop web page.

 

Keep reading to find out how you can join us in this network.

 

Guest post by Carole Balawender

Information Engineering Team Lead

HP Network Management Center 

Improve how you investigate and diagnose network problems

What is your first reaction when you encounter a network problem?  These are some of the most difficult problems to fix because there are so many “moving parts” within every network.

 

Keep reading to find out how HP Network Node Manager i can help make this job easier.

Network Node Manager (NNMi) 10.00 - Making a perfect 10!

teaser.jpg

Two months ago we released HP Network Node Manager i 10.00 and the response to the software has been incredible. To help answer some of the most common questions, I have created a top 10 list of the new functionalities.

 

Keep reading to see the list and to learn how you can experience these advancements for yourself.

How inventory works in HP Network Automation software

In my previous post, I discussed how Network Node Manager i helps with network inventory. Today I take a closer look at its power counterpartNetwork Automation.

 

Keep reading to discover the keys of Network Automation and how they can work for you.

Now there is 1 solution for monitoring your Multi-Tier WebLogic Applications

As a web application weblogic_diagram1.jpgowner have you struggled to isolate the root cause of your degrading application performance? This doesn’t have to be your continuing reality.   

 

If you are interested in learning how to better isolate the root cause of your degrading application performance, keep reading.

HP IT: Getting to Root Cause Faster with Data Analytics - Webinar

Join Gary Brandt, HP Global IT Functional Architect, to learn how HP IT incorporates best operational practices to collect and analyze structured and unstructured data using big data analytics at enterprise scale in this free live webinar.

Vivit webinar: “What’s New in Network Management Center 10.0”

Vivit-logoJPG.JPGLearn what’s new with HP Software network products HP Network Node Manager i 10.0 (NNMi) and Network Automation 10.0 (NA) at this interactive webinar on 9 Sep.

What’s new in HP Network Management Software 10.00

The next big release of HP Network Management Software is available: version 10.0 of HP Network Node Manager i (NNMi) and HP Network Automation (NA).

 

Read on to see what's new...

New Video: HP RUM for Mobile – In-depth reporting on the health of your mobile apps

Are you a mobile application owner who needs to monitor your customers’ user experience? Perhaps you are an application support engineer, who wants to quickly detect and isolate users’ problems.

 

HP Real User Monitor (RUM) for Mobile Monitoring can help! Continue reading to learn how you can take advantage of RUM Mobile Health reports to quickly identify and solve problems with your mobile app!

New Video: HP RUM for Mobile Monitoring – Instrument your app in a snap!

RUM for Mobile Monitoring.pngDo you need to monitor the performance of your mobile apps to keep your customers satisfied? Does this sound like an impossible task?

 

Watch this new video to see how RUM for Mobile Monitoring makes this incredibly easy!

Bring your own code: How to simplify customizing event correlation software


code2.pngExtensibility and customization are key capabilities that every enterprise software should offer. As is so often the case, there is no one size fits all enterprise software solution, because your business process will always differ from any “standard”—that’s what gives you a competitive edge.

Leading event correlation and monitoring software will not force users to write code for simple customization. No coding is required to adjust most event correlation features to your needs. But what if you need even more customization? Is any complex coding required? Does the overall system become complex, or even worse, does it become fragile because of custom code that you inject?

Do I need OneView when I have OpenView?


Openview1.pngHP offers a new infrastructure management product: HP OneView. It is a fresh approach to infrastructure management.

 

Many users of HP’s former OpenView products are asking if OneView is a complementary solution - and they are. In this blog post, I will take a closer look at the relationship between OneView and the HP Software products formally known as OpenView.

Breaking down monitoring silos - combining systems, network, and application management

As you might have read on the Infrastructure Management blog and Network Management blog, we are migrating those blogs into this BSM blog. We will continue to cover Application Performance Management too. This mirrors the actions that many of our customers are taking in consolidating their monitoring silos into a single Operations Bridge.

Fighting or friendly, Problem Isolation and OMi

by Michael Procopio


In the post  Event Correlation OMi TBEC and Problem Isolation What's the Difference, my fellow blogger, Jon Haworth, discussed the differences between TBEC and Problem Isolation. To be consistent, I'll use the acronyms PI for Problem Isolation and TBEC to refer to OMi (Operations Manager i series) Topology Based Event Correlation.


Briefly, he mentioned that TBEC works “bottom up”, that is starting from the infrastructure, with events. PI works “top down”, that is, starting from an end user experience problem, primarily with metric (time series) data.


Jon did an excellent job describing TBEC; I’ll do my best on PI because like Jon I have a conscience to settle.


Problem Isolation is a tool to:


1. automate the steps a troubleshooter would go through


2. run additional tests that might uncover the problem


3. look at all metric/performance data from the end user experience monitoring and all the infrastructure it depends


4. find the infrastructure metric the most closely matches the end user problem using behavior learning and regression analysis techniques (developed by HP Labs)


5. bring additional data such as events, help/service desk tickets and changes to the troubleshooter


6. allow the troubleshooter to execute Run books to potentially solve the problem


Potentially the biggest difference in the underlying technology is that Problem Isolation does not require any correlation rules or thresholds to be set for it to do the regression analysis to point to the problem. Like TBEC, it does require that an application be modeled in a CMDB.


An example: Presume a situation with a typical composite application - web server, application server and database. No infrastructure thresholds were violated; therefore, there are no infrastructure alerts. Again, as mentioned in the previous post, end user monitoring (EUM) is the back stop. EUM alerts on slow end user performance, now what?


Here is what Problem Isolation does:


1. determines which infrastructure elements (ITIL configurations items or CIs) support the transaction


2. reruns the test(s) that caused the alert – this validates it is not transient problem


3. runs any additional tests defined for the CIs


4. collects Service Level Agreement information


5. collects all available infrastructure performance metrics (web server, application server, database server and operating systems for each) and compares them to the EUM data using behavior and regression analysis



Problem Isolation screen show performance correlation between end user response and SQL Server database locks


-------------------------------------------------------------------------------------------


6. determines and displays the most probable suspect CI and alternates


7. displays run books available for all infrastructure CIs for the PI user to run directly from the tool


8. allows the PI user to attach all the information to a service ticket, either existing or create a new one


Another key differentiator of OMi/TBEC and PI is the target user. There is such a wide variance in how organizations work that it is hard to name the role but let me do a brief description and I think will be able to determine the title in your organization.


There are some folks in the organization whose job is to take a quick look (typically < 10 minutes, in one organization I interviewed < 1 minute) at a situation and determine if they have explicit instructions on what to do via scripts or run books. When they have no instructions for a situation they pass it on to someone who has a bit more experience and does some free form triage.


This person might be able to fix the problem or may have to pass it on to a subject matter expert, for example if they believe it is an MS Exchange problem to an Exchange admin. It is this second person that Problem Isolation is targeted at. This is helping automate her job, reducing what might take tens of minutes to hours and performing it in seconds. If it ends up she can’t solve the problem it automatically provides full documentation of all information collected. That alone might take someone five minutes to write-up.


OMi’s target is the operations bridge console user. Ops Bridge operators tend to be lower skilled and face hundreds if not thousands of events per hour. Jon described how OMi helps them work smarter.


TBEC and Problem Isolation both work to find the root cause of an incident but in different ways. Much like a doctor might use an MRI or CAT scan to diagnose a patient based on what the situation is, TBEC and Problem Isolation are complementary tools each with unique capabilities.


Problem Isolation will not find problems in redundant infrastructure that OMi will. Conversely, OMi can’t help with EUM problems when no events are triggered, where Problem Isolation will.


We know this can be a confusing area. We welcome your questions to help us do a better job of describing the difference. But these two are definitely friendly.


For Business Availability Center, Michael Procopio


Get the latest updates on our Twitter feed @HPITOps http://twitter.com/HPITOps


Join the HP Software group on LinkedIn and/or the Business Availability Center group on LinkedIn.


Related Items



  1. Advanced analytics reduces downtime costs - detection

  2. Advanced analytics reduces downtime costs – isolation

  3. Problem Isolation page

  4. Operations Manager i page

Announcing a New BSM Solution Offering for Virtualization

by Michael Procopio


The new offerings include enhancements to HP
Server Automation
, HP
Client Automation
, HP
Storage Essentials
, HP
Network Automation
, and HP Operations
Manager
software.


According to a recent report from
Gartner(1), "Virtualization’s impact on the overall IT industry has
been dramatic, and virtualization will continue to be the leading catalyst for
infrastructure and operations software change through 2013. Organizations are
looking at ways to cut costs, better utilize assets, and reduce implementation
and management time and complexity."


Although virtualization is often adopted to help reduce capital expenditures,
it can trigger increased management expenses and lead to more pronounced
organizational silos. The new HP offerings bridge all the physical and virtual
data center silos through management and automation. This reduces complexity and
ultimately management costs.


"HP business service automation software helps us eliminate manual,
error-prone tasks by automating server lifecycle management, including
provisioning multiple operating systems, software installation, deployment of
patches, configuration management and audits," said Ron Cotten, senior manager
IT OSS Engineering, Level 3 Communications, a leading international provider of
voice, video, and data communications services. "With HP Server Automation, we
are able to patch over 1800 servers in 24 hours, which helps us reduce scheduled
downtime."


The updated HP business service offerings help you:


 


· Increase administrator effectiveness with HP
Operations Manager for virtualization
by monitoring the availability and
performance of all virtual and physical assets through a common dashboard.


· Reduce the risk of downtime with HP Network Automation, which for the
first time gives the network administrator control of the VMware vSwitch in
addition to the physical network environments.


· Provision the right amount of storage to keep applications performing
properly without overspending on excess storage with HP
Server Automation
, the first solution in the industry that gives server
administrators this capability.


· Reduce problem resolution times with HP
Storage Essentials
Performance Edition by quickly identifying,
troubleshooting and reporting performance metric related trends in physical and
virtual environments.


 


“To make virtualization cost effective, customers must minimize operating
expenses and have seamless management of infrastructure silos,” said Erik
Frieberg, vice president of Product Marketing, Software & Solutions, HP.
“Our newly enhanced HP business service offerings help customers manage all
aspects of the physical and virtual application infrastructure to unlock the
true promise of virtualization.”


HP Software Professional Services provides solution consulting services to
accelerate the value of business service automation and business service
management software investments.


HP
Server Automation
, HP
Client Automation
, HP
Storage Essentials
and HP
Network Automation
are available now. HP Operations Manager Virtualization Smart
Plug-In
will be available next month.


 


 

Not true, IBM

 By Mike Shaw


IBM recently made some incorrect claims on their web site about HP's management products. The network side of those claims was handled on our network management blog.  I wanted to handle the application management claims here.


 


J2EE Diagnostics Claims


IBM claimed that HPs BAC solution (our solution for application management) cannot provide drill down into J2EE applications. This is not true:


 



  • HP Diagnostics software for J2EE provides a top-down, end-to-end lifecycle approach for seamlessly monitoring, triaging and diagnosing critical problems with J2EE and Java applications – in both pre-production and production environments. 

  • HP Diagnostics for J2EE starts with the end-user (real and synthetic), then drills down into application components, systems layers and back-end tiers – helping you rapidly resolve the problems that have the greatest business impact

  • HP Diagnostics will monitor any java application, and will discover and monitor the relationship between applications (java and .net)


 


Application and infrastructure data integration claims


IBM further claimed that HPs BAC does not have the capability to correlate application data to infrastructure data. This is not true. Our integration between the application and the infrastructure layers is two-way - from bottom-up and from top-down.


 




  • Bottom-up:



    • You can see how an event impacts business services above by looking upwards thru the service topology held in HP's CMDB. The services you can look up to may be applications, they may be a user experience (e.g. the online checkin user experience) or they may be steps in a business process. And, you can see what SLAs are resting on the impacted services and those SLAs' closeness to jeopardy. 

    • This service topology information can be discovered using a number of different methods, all under the overall control of the dynamic discovery manager. For example, if you have OperationsCenter's Smart Plug-ins (SPI), many of these do discovery of their domains and this is now fed into the CMDB. Or, if you are doing agentless monitoring (less expensive to buy and manage, but not the same level of fidelity and action control as with agents - it's horses for courses), this will also discover the hierarchies under the items it's monitoring. And if you have NNMi, our network management product, it will put its end-point discovery into CMDB. If you want everything discovered from business service on down, you can use our advanced discovery technology. As I said earlier, ourdiscovery manager is the overall controller, orchestrating the other discovery methods like SPIs and NNMi should you choose to use them

    • The new OMi "TBEC" (topology-based event correlation) technology is able to take an event stream, map the events to services, and then group events related by services in the service topology and thus infer which are causal events (events we need to take action on) and which are symptomatic events (events that are as a consequence of a causal event and thus don't need to be actioned). Included in the symptomatic events may well be an event from our user experience or business transaction monitoring technology. Imagine a DB is having a performance problem. This, in turn, causes a user application to slow. The real user monitor notices this an raises an event. The OMi TBEC will notice both events, realize they are related in the service topology, and infer that the DB problem is the cause and the real user monitor event is a symptom. Is this new? No - the technology was invented by Bell Labs and has been in our NNMi network management product for about 18 months now.

    • Summary: bottom-up we have two links up to the application / business service layer. The first is for exntensive "service impact analysis" and the second is for TBEC - for analysis so you just get to see the actionable events you need to do something about.






  • Top-down



    • Our performance triage technology takes performance and event information from dependent services (those services the business service having a performance problem rest on). It uses an HP Labs' patented algorithm to infer causal relationships between infrastructure service performance and fault and the business service's performance. So what? This allows you to know which area is causing the performance problem. Useful given that the average performance problem goes thru 6 to 8 groups before being solved!  By the way, the event stream doesn't have to come from Operations Center. We can, should you still have it having not swallowed the rip'n'replace mega-pain yet, take events from Tivoli (or any other event management system).

    • The performance triage module doesn't just look at performance and event streams. It looks at recent changes in the dependent services as determined by the discovery monitor (e.g. Server XYZ has had 4gig of memory ripped out). I'm sure you've heard the stat that if a change has a occurred, there's an 80% chance it's the cause of the problem.

    • And, as of last November, the performance triage module also considers the compliance state of the dependent services. How does it do this? The ex-OpsWare Server Automation product now puts its discovered information into the CMDB too, and compliance state is one of the things it discovers.  I'm sure there's a stat on how non-compliant systems screw up business services above :-)






  • And finally, something we are very proud of, and something that people really like - the 360 degree view.  Take a service, any service. For that service, you can see the following.....



    • The performance of the service versus its KPIs. Now and over time.

    • What services are above it

    • What user experiences are resting on it and what their state is

    • The business processes resting on it and the throughput of those services (i.e. Are they slowing down because of this service?)

    • The SLAs resting above this service their closeness to jeopardy

    • The status of the services this service is resting on

    • The change state of services at and below this service

    • The compliance state of services at and below this service

    • The planned changes for this service

    • What the service desk knows about this service in terms of incidents - "do we get an incident on this every Monday at this time?"




 


 


 


OK. I've gone to town on this response a little bit. But to HP Software saying we can't correlation application data to infrastructure is like telling Eugene Bolt he can't run!


 


Rip out Operations Center and replace it with NetCool


Finally, in this piece of their web site, IBM was suggesting people move from Operations Manager to NetCool. As you probably know, the migration from Tivoli to NetCool is a rip'n'replace. Operations Manager has never done this to our customer base. As a recent and concrete example, the new OMi functionality with its ability to do topology-based (i.e. no writing of event correlation rules) event correlation to reduce event streams to actionable events is an ADD-ON to existing Operations Center installations. No rip, no replace.


 


If however, you have a predilection for rippin' and replacin', then please do consider the move from Operations Center to NetCool. Personally, I'd add OMi instead because I'd want the topology based event correlation and easy life - but maybe that's just me!


 

HP Software Universe - day 1

by Michael Procopio


 


Today was the first day of Software Universe. I had customer meetings all day today. Here are some interesting items from my conversations.



  1. Most said budgets were down in 2009 and will be flat to down in 2010. But a few who were related to government stimulus said theirs will be up.

  2. Co-sourcing and outsourcing continue as ways to reduce costs

  3. A few were focusing on asset management with the express purpose of getting rid of things in the environment they don’t need anymore. They know they are out there but they need to find them first.

  4. Most customers I spoke to said they keep aggregated performance data for 2 years the range was 18 months to 5 years.

  5. There was an interesting discussion about the definition of a business service versus an IT service. The point being made was a business service by definition involves more than IT. While I agree this is a good point, I think the IT industry has focused on business service as a way to say - “I’m thinking about this IT service in the context the business thinks about it not just from my own IT based perspective”

  6. A number of customers have or are about to implement NNMi. If this is something you are interested in check out the NNMi Portal

  7. Many customers are moving to virtualized environment highest percentage I heard was 70%. Another customer forces all internal developers to deliver software as a virtual image.

  8. Another topic was how to monitor out tasked items. For example, some part of what you offer is delivered by a third party - how do you make sure they are living up to your standards. Two methods I heard were 1/ use HP Business Process Monitor 2/ get the 3rd party to send you alerts from their monitoring system.

  9. On the question does your manager of managers send back data to sync the original tools 1 did, 1 didn’t. For the one who did it was part of a closed loop process.

    • Monitor tool finds problem send alert to MOM (Manager of managers).

    • MOM send event ID to monitoring tool

    • Subject matter expert uses monitoring tools to diagnose problem

    • Once diagnosed updates monitoring tool which updates MOM




A very productive day for me. I hope some of this is useful information to you.


For additional coverage my blogger buddy Pete Spielvogel is also here and beat me to the first post. You can read his posts at the ITOps Blog.


There are a variety of Twitter accounts you can follow as well as the hashtag #HPSU09


HPITOps – Covers BSM, Operations and Network Management


HPSU09 – show logistics and other information


HPSoftwareCTO


informationCTO


HPSoftware


BTOCMO – HP BTO Chief Marketing Officer 


 


For HP BSM, Michael Procopio

BSM at HP Software Universe

by Michael Procopio


 


HP Software Universe is next week, 16-18 June, in Las Vegas. Business Service Management (BSM) will be well represented.


In the Business Transaction Management area there are 13 sessions. Most of them are lead by customers. The sessions are listed below.


In the Network Management track Aruna Ravichandran is speaking in three sessions, you can see information on those at her post HPSoftware Universe/HP Technology Forum (HPTF) - Network Management sessions. The rest of the track is listed in the post Network Management at HP Software Universe.


Amy Feldman, Dennis Corning and Peter Spielvogel the ITOps bloggers has covered a number of the sessions in the Consolidated event and performance management. Here are a list of the posts:



 


Business Transaction Management Track













































































































Session ID Title Presenting company
1114 Confessions of a product manager: get the real scoop on the latest HP Business Availability Center HP
1165 The MITRE Corporation: higher operational effectiveness at lower cost through automated alert management MITRE Corporation, AlarmPoint
1233 Key decisions and practical techniques in configuring business transaction management
1236 Real User Management: know how your TCP/IP applications perform for your users HP
1267 Using HP Business Availability Center to analyze and triage application and infrastructure anomalies and problems BCBS of Florida
1303 Sodexo: partnering with HP Software-as-a-Service to ensure critical e-business application performance and availability Sodexo
1342 Wrigley: HP Business Availability Center deployed on Software-as-a-Service yields big improvements in IT monitoring without increasing staff Wrigley
1360 Lockheed Martin: deploying HP Business Availability Center in a virtual environment and forwarding alerts through an iPhone Twitter-based application Lockheed Martin
1363 DIRECTV: an HP Business Availability Center and HP operations implementation DIRECTV
1401 Liberty Life: taking the fast track to implementing HP Business Availability Center and gaining business value in 6 months Liberty Life
1425 Sentara Healthcare: improving the availability of critical business services and fixing IT problems before they impact customers Sentara Healthcare
1436 Lockheed Martin: practical advice for configuring and operating HP End User Management solutions Lockheed Martin
1452 Vale: deploying HP Business Availability Center solutions to monitor applications and systems and to help ensure availability and performance Vale

 You can get the details of all the BSM sessions at the HP Software Universe Track Session Catalog.


I hope to see you there, but if you can’t make it we will be doing follow-up posts. You can also follow on Twitter, the hashtag is #HPSU09. There are already a number of Tweets and the show hasn’t started yet. The Twitter account for the show is HPSU09, if you’d like to follow us. Or visit the HP Software Universe Facebook page.


For the Business Availability Center, Michael Procopio


 

Advanced analytics reduces downtime costs – isolation

by Michael Procopio, Product Manager, BAC 



In the world of advanced analytics, two areas that are of interest to the IT management world are:  detection of a problem and isolation of a problem. Previously I wrote Advanced analytics reduces downtime costs – detection; in this post I’ll cover isolation.


In the previous post, I covered how advanced analytics finds an anomaly, potentially before a threshold is crossed.


Problem Isolation is the process of determining which component in the infrastructure is causing the problem* or incident* that we found. We will presume we are monitoring the service that is having the issue.


If one had no management tools (amazingly I have spoken to customers in this situation) the method of trying to find a problem is to login to each system, router, switch and potentially application (ex: Oracle) look at the items with whatever tools are available (ex: Windows Perfmon)and hopefully you find it. If you are interested in advanced analytics, this is probably not your situation.


The more typical case is you have multiple management tools, network, system, virtualization, database and perhaps others. So if  you know the domain the problem exists in you have a good place to start. I’ve listened to podcasts / read reports which bring up few problems with this: (if you know of any good IT podcasts please send them along)



  • ~80% of problems are sent to the network team with only ~20% being network issues

  • ~60% of problems take >10 experts to resolve

  • ~80% of the time to restore service is spent isolating the problem


Here is an analogy I use with my non IT friends on why this area is needed. You are monitoring the speed of a car going across the country (pick your favorite country). You are separately monitoring the infrastructure, all:



  • roads

  • bridges

  • ferries to take cars across the water


What you don’t know is where the car is (old car, no GPS). You are getting many alerts from the roads, bridges and ferries. Which one is affecting the car? Since you don’t know what road the car is on you don’t know if any given alert is the one affecting your car.


This is where the CDMB comes into the isolation process. The CMDB has the route the car is taking or, in our case, the items in the IT infrastructure that make up the service that has the problem.


Part one of the isolation process is to restrict what we are looking at to the relevant IT items. This greatly reduces the computational power required. For example, one customer I recently visited told me he has 2000+ servers. If we can reduce that to a few app servers and a few database servers (isn’t SOA wonderful for we operations types) that is a factor of ~200 reduction.


Part two of the isolation is the heavy math from HP Labs, with more patent filings.  It is a form of regression analysis, where application or end user response time monitoring is the dependent variable and all the infrastructure metrics are independent variables. In plain terms, if end user response gets worse find the infrastructure metrics that get worse. When end user response gets better find the metrics that get better. The more closely an infrastructure metric tracks the end user response the more likely it is to be the cause.


Again, while the math is interesting, pictures work better for me.


 


The thick grey line is the end user response, the red-purple line is the most closely correlated metric -- in this case a database metric. Just so you don’t have to strain your eyes we provide a table like this (from a different problem) showing the weighted correlations score.


 


Isolation part 3 is to include non-time series data. In the screen capture below you see planned changes and incident details (think alerts) on the timeline. Unplanned changes can also be displayed. Changes are pulled from the CMDB and incidents can come from any management system that can send alerts. And since we know that most problems occur from changes that is an important component. Finally tickets from the helpdesk are included on the timeline, for the case where users are doing the monitoring.


 


All together this automates a number of things the operations teams already do and some math help isolating problems.


 


*Incident and problem are ITIL terms. There may be many incidents that are symptoms of an underlying problem.


 


tweet this!


 


Related Items



Since I asked for podcasts here are some I listen too:


Fuel Efficient IT Operations

Mike Shaw, BSM Product Marketing.


My wife just bought a BMW 118D. The 118D won the "Green Car of the Year" award in 2008 at the New York Auto Show.  It does an amazing number of miles to the gallon (km to the litre / miles to the US gallon). Her old car (also a BMW) did about 26 miles per gallon. The 118D does 63 miles per gallon. Now, the new car is slightly smaller, so we're not comparing apples to apples. However, you get the point -- car manufacturers are pushing fuel economy to new limits. At the cost of acceleration? Not that I've noticed - when you put to the floor in the 118D, it most certainly accelerates.


I think there are parallels between fuel economy and IT operations.  During a down-turn, because there is less activity, there is less pressure on IT operations (fewer events, fewer system overloads, etc). This is like a car that is only required to go at 30 miles per hour and accelerate slowly because that's what everyone else on the road is doing.  In an attempt to cut the costs of motoring, one might be tempted to adjust the fuel injector so that a smaller amount of fuel is available. This will cut fuel costs during this recessionary period.


 


BUT, when we come out of recession (some time in 2010??), acceleration will be required. Actually, our competitors will be accelerating - it's up to us whether or not we match them. If we've chosen to create a fuel efficient car (like the BMW 118D), then we can match the required acceleration and have fuel efficiency. If we've decided to simply cut the fuel that goes into the car without any consideration for fuel efficiency, our competitors will accelerate away from us come the upturn.


 


During a down-turn, we are under pressure to cut IT operations costs. In fact, in a recent IDC study performed for HP Europe, 40% of customers surveyed said they were very likely to cut IT operating costs while 74% said it was likely they would cut IT ops costs.


 


We have two choices in how we behave in response to this pressure to cut costs. We can take a simple "let's cut people and that's it" path, or do we take the "fuel efficiency" path and create an IT operations to match the BMW 118D. If we just cut people, we'll drown in IT operations stuff when the upturn comes. If we create a fuel efficient IT ops engine, we'll be able to embrace the acceleration when the upturn comes.


 


This sentiment is echoed by recent comments make by HP's CEO, Mark Hurd (I'm sure Mark will be greatly comforted to know that he and I are in snych on this one). Mark said he didn't want to simply cut heads because when the upturn comes, he won't have the "people muscle" required to handle the upturn. HP's IT department is taking the BMW 118D approach - data centre consolidation, network operations efficiency, centralized event management, pro-active user experience management, constrained self-serve of IT product, etc.


 


So, how do we create a fuel efficient IT operations? I'm not an expert across the whole IT operations stack, so I'll talk to the area I know about - availability and performance management.  And in the interests of keeping these blog posts to a manageable size, I'll do that in the next post.


 


(Footnote: I'm sure all car manufacturers are producing more fuel efficient cars. My wife just happens to like BMWs, and she only looked at BMW!  I'll bet the average HP sales rep wished their customers were so loyal (naive ??))

BSM Evolution: The CIO/Ops Perception Gap

 


There are many potential culprits for why IT organizations struggle to make substantive progress in evolving their ITSM/BSM effectiveness. A customer research project we did a few years ago offered an interesting insight into one particular issue that I rarely see the industry address. The research showed that most CIO’s simply had a different perception – when compared to their IT operations managers- of their IT organization’s fundamental service delivery maturity and capability. This seemingly benign situation often proved to be a powerful success inhibitor.


 


The Gap:


A substantial sample size of international, Global 2000 enterprise IT executives participated in the study. When asked to prioritize investment priorities on a broad range of IT capabilities, we saw a definite gap. IT Operations managers consistently ranked, “Investing to improve general IT service support and production IT operations” in their top 1 or 2 priorities, where CIO’s ranked this same capability much lower as a priority 6 or 7.


 


The Perception:


When pressed further, CIOs believed that the IT service management basics of process and technology were already successfully completed, and the CIO’s had mentally moved on to other priorities such as rolling out new applications, IT financial management, or project and portfolio management.


 


Most of the CIOs in the study could clearly recall spending thousands of dollars sending IT personnel to ITIL education, and thousands more purchasing helpdesk, network, and system management software. Apparently, these CIO’s thought of their investment in service operations as a onetime project, rather than an ongoing journey that requires multiple years of investment, evolution, reevaluation, and continuous improvement.


 


IT operations managers on the other hand- clearly had a different view of the world. They were generally pleased with the initial progress from the service operations investments, but realized they were far from the desired end state. The Ops managers could plainly see the need to get proactive, to execute advanced IT processes and more sophisticated management tools, but could not drain the proverbial swamp while fighting off the alligators.


 


The Trap:


We probed deeper in the research, diligently questioning the IT operations managers on why they didn’t dispel the CIO’s inaccurate perception. In order to secure the substantial budget, these Ops managers had fallen into the trap of over-promising the initial service management project’s end-state, ROI and time to value. (I wouldn’t be surprised if they had been helped along by the process consultants and software management vendors!)


 


These Ops managers saw it as “a personal failure” to re-approach the CIO and ask for additional budget to continue improving the IT fundamentals. Worse yet, they had to continually reinforce the benefits from the original investment so the CIO didn’t think they had wasted the money. So, the IT operations staff enjoyed the result of reactively working nights and weekends to meet business’ expectations, and make sure everyone kept their jobs. Meanwhile, the CIO’s slept well at night thinking, “Hey, we are doing a pretty darn good job”, but faced the next day asking, “Why are my people burnt out?” A vicious cycle.


 


Recommendation through Observation:


Im not wild about making recommendations since I merely research this stuff… not actually perform hands-on implementation. Instead, I will offer some observations of best practices from companies who appear to be breaking through on BSM, lowering costs, raising efficiency and improving IT quality of service.


 



  1. Focus on Fundamentals: It is boring and basic, but absolutely critical to continually look for ways to improve the foundational service management elements of event, incident, problem, change, and configuration management. Successful IT organizations naturally assume that if they implemented these core processes more than 3 years ago, they likely need to update both technology and process. If FIFA World Cup Football clubs and Major League Baseball teams revisit their fundamental skills each and every year, why wouldn’t IT?

 



  1. Assume a Journey: IT leaders who develop a step-wise, modular path of realistic projects that deliver a defined ROI at each step have the best track record of securing ongoing funding from the business. The danger here is defining modular steps that are so disconnected and silo’d, that IT never progresses toward an integrated BSM/ITSM process and technology architecture. This balance continues to be one of the most difficult to manage.

 



  1. Empowered VP of IT Operations: The advantages of a CIO empowering a VP of IT operations and holding them accountable for end-to-end business service has been discussed in previous posts. The practice of having a strong VP of operations who has executive focus on service operations and continual service improvement, while having end-to-end service performance responsibility does appear to be a growing trend and success factor.

 



  1. Focus on the Applications: In the same research study that showed the perception gap on, “Investing to improve general IT service support and production IT operations”, there was consistent agreement on, “Investing to improve business critical application performance and availability”. The CIO’s, Ops Managers and Business Relationship managers all ranked this capability as a top 1 or 2 priority.

 


Successful BSM implementations focus on the fundamentals of process and infrastructure management, but do so from a business service, or an application perspective. This approach not only enables an advantageous budget discussion with the business, but it also hones the scope and execution of projects.


 


It is difficult to assess the relative impact of this CIO/IT Ops perception gap, considering the wide variety of challenges that IT faces. But hopefully, this post gives you something to consider when assessing your own IT organization’s situation and evolution.


 


Let us know where your organization fits – please take our two question survey (two demographics questions also). We’ll publish the results on the blog.


 



  • Describe the perception of your IT's fundamental service delivery process

  • How often does your IT organization significantly evaluate and invest to update your fundamental IT process

 


Click Here to take survey


 


Bryan Dean – BSM Research

Monitoring your cloud computing as easy as calling an airport shuttle

HP made an announcement about new cloud computing management capabilities today: HP Unveils "Cloud Assure" to Drive Business Adoption.


HP currently offers Software-as-a-Service (SaaS) for individual management applications such as HP Business Availability Center (BAC) and HP Service Manager primarily for intranet and extranet applications.




HP Cloud Assure helps customers validate:



  • Security – by scanning networks, operating systems, middleware layers and web applications. It also performs automated penetration testing to identify potential vulnerabilities. This provides customers with an accurate security-risk picture of cloud services to ensure that provider and consumer data are safe from unauthorized access.

  • Performance – by making sure cloud services meet end-user bandwidth and connectivity requirements and provide insight into end-user experiences. This helps validate that service-level agreements are being met and can improve service quality, end-user satisfaction and loyalty with the cloud service.

  • Availability – by monitoring cloud-based applications to isolate potential problems and identify root causes with end-user environments and business processes and to analyze performance issues. This allows for increased visibility, service uptime and performance.




HP Cloud Assure provides control over the three types of cloud service environments:



  • For Infrastructure as a Service, it helps ensure sufficient bandwidth ability and validates appropriate levels of network, operating system and middleware security to prevent intrusion and denial-of-service attacks.

  • For Platform as a Service, it helps ensure customers who build applications using a cloud platform are able to test and verify that they have securely and effectively built applications that can scale and meet the business needs.

  • For Software as a Service, it monitors end-user service levels on the cloud applications, loads tests from a business process perspective and tests for security penetration.




A diagram showing the differences in the services is at Cloud Computing Basics.




In the end it doesn't matter where the service is; you need to be sure it is available and performing to expectations. Cloud Assure provides the capability in a way that is very agile. You say "I need this service monitored" and it is monitored. Its just like calling for an airport shuttle -- you call, they show up.




Related articles:





For Business Availability Center Michael Procopio, product manager HP Problem Isolation.

Business Service Visibility & Accountability: Where is it Homed?

Virtually every customer that I have studied has a critical moment in their BSM evolution where they realize the need for viewing, measuring and reporting business service performance in a business-relevant way.  We could discuss the technical complexities of integrating service model discovery, end-user experience, transaction management, performance and event data to develop this business service view, but in this post I’m going to examine the most common key personas, core motivations, and organizational impact.

In the previous post, BSM Evolution Paths: Auto Industry Sample we saw how the core motivation came tops-down from senior IT management.  Let’s compare three different models. 


Line of Business / Application Driven


Key Personas:  Application owner, Business Relationship Manager, Business Unit CIO


 


Core Motivations:  These personas are typically closest to how business utilizes IT to execute a business process or function.  They usually report into the business unit itself, rather than into IT Operations.  They have responsibility for the application, but the business perceives them as owning the end-to-end service performance, even though they often have little control of the underlying IT infrastructure and service delivery processes.


 


At some point, a business critical service melts-down, or endures a never-ending spree of performance degradations where Global IT Operations says, “All the systems and network are green”.   This is the point where many business unit managers take matters into their own hands and fund a significant investment in End-to-End business service visibility tools.


 


Software:  Since they do not control the infrastructure, the application owners often look for tools that require minimal agentry and do not require a lot of feeds from the individual domain management tools.  They gravitate towards sophisticated end-user experience tools, probes, application diagnostics, and the ability to traverse composite application middleware. 


 

Organization:  They use these tools to prove accountability to the business units, but they also use the tools -not always politely- to hold infrastructure operations accountable. The animosity usually wanes, and the separate IT groups work out the process integration… but often not the tool integration. This leaves the end-to-end group outside of IT Operations. We also see this model where the infrastructure operations are outsourced, and the service provider is held accountable to specific Service Level Agreements. 


Infrastructure Operations Hero


Key Personas:  Infrastructure Operations Manager, Data Center Manager, NOC manager


 


Core Motivations:  These personas traditionally have the responsibility for care and feeding of the vast shared-service IT infrastructure environment.  They have likely done a reasonable job of consolidated event management, and domain-level configuration, performance and capacity management.  But, they have a vision of elevating IT to demonstrate the value delivered to the business, and proactively solve issues before end users report them.  This effort can be either in conjunction or parallel to an ITIL-driven service management initiative.


 


Software:  Often very budget constrained, they don’t always have the funding that the application owners do.  They look first toward leveraging investment of their existing tool set, gathering agent-based data from their infrastructure and augmenting with lighter-weight end-user experience tools.  Converting this data to business-relevant information is difficult, as they often don’t have the deep business process or application knowledge, but it is much better than the previous IT element statistic data. 


 

Organizational:  The Hero Operations manager then faces the daunting task of taking the new service oriented visibility and reporting capability to upper management and business unit managers.  Sometimes they yawn. Sometimes the strategy is embraced, and the operations manager is elevated to strategic status. The Operations manager keeps both tools and processes very integrated.  New end-to-end skill sets are developed, but usually not new organizational groups.  

Tops-down Service Management  
Key Personas:  CIO, CTO, VP IT Operations


Core Motivations: These personas have the luxury of controlling the organization, budget and overall priority of IT, yet their job is likely on the line.  Pressure from the business units, a personal drive to elevate IT to a strategic partner and sometimes fear of being outsourced are the powerful drivers.


 


Business service visibility and accountability is usually part of a larger, multi-project, multi-step roadmap that includes a hefty process component.  Since these initiatives tend to be “horizontal” in nature across all IT, many companies fall into the trap of trying to institute end-to-end business service performance tools too broadly.  The successful organizations focus on a discrete business service and satisfy key metrics that are specific to the particular business and application. 


 


Software:   These personas tend to focus on service level management, and the ability to demonstrate the value IT is delivering to business.  Typically requires a substantial investment in tools that can abstract the business services into something meaningful to business, looks hot to business stakeholders, yet also improves service delivery time to diagnose and repair.  This ends up requiring a rationalization of the service discovery model, CMDB, and the enterprise operational tools.   


 


Organization:  I’ve seen some CIO’s form executive business relationship management functions, keeping the team independent from both business and IT Operations.  Other CIO’s formally extend the VP of IT operations charter to include this new end-to-end function that bridges the infrastructure operations teams and the helpdesk/service desk teams.  

Conclusion?
Here’s a news flash…there is a wide variety of organizational models.  But there are some definite patterns, and in my next post, I will offer some evidence that the model will be more predictable in the future.Bryan Dean, BSM Research 

 

Search
Showing results for 
Search instead for 
Do you mean 
About the Author(s)
  • Doug is a subject matter expert for network and system performance management. With an engineering career spanning 25 years at HP, Doug has worked in R&D, support, and technical marketing positions, and is an ambassador for quality and the customer interest.
  • Dan is a subject matter expert for BSM now working in a Technical Product Marketing role. Dan began his career in R&D as a devloper, and team manger. He most recently came from the team that created and delivered engaging technical training to HP pre-sales and Partners on BSM products/solutions. Dan is the co-inventor of 6 patents.
  • This account is for guest bloggers. The blog post will identify the blogger.
  • Over 11 years of experience in design and development of NMS/EMS products and presently with the Device content support covering broad based features of multitude device vendors in NNMi.
  • Manoj Mohanan is a Software Engineer working in the HP OMi Management Packs team. Apart being a developer he also dons the role of an enabler, working with HP Software pre-sales and support teams providing technical assistance with OMi Management Packs. He has experience of more than 8 years in this product line.
  • HP Software BSM Social Media
  • Nimish Shelat is currently focused on Datacenter Automation and IT Process Automation solutions. Shelat strives to help customers, traditional IT and Cloud based IT, transform to Service Centric model. The scope of these solutions spans across server, database and middleware infrastructure. The solutions are optimized for tasks like provisioning, patching, compliance, remediation and processes like Self-healing Incidence Remediation and Rapid Service Fulfilment, Change Management and Disaster Recovery. Shelat has 21 years of experience in IT, 18 of these have been at HP spanning across networking, printing , storage and enterprise software businesses. Prior to his current role as a World-Wide Product Marketing Manager, Shelat has held positions as Software Sales Specialist, Product Manager, Business Strategist, Project Manager and Programmer Analyst. Shelat has a B.S in Computer Science. He has earned his MBA from University of California, Davis with a focus on Marketing and Finance.
  • Architect and User Experience expert with more than 10 years of experience in designing complex applications for all platforms. Currently in Operations Analytics - Big data and Analytics for IT organisations. Follow me on twitter @nuritps
  • 36-year HP employee that writes technical information for HP Software Customers.
  • Pranesh Ramachandran is a Software Engineer working in HP Software’s System Management & Virtualization Monitoring products’ team. He has experience of more than 7 years in this product line.
  • Ramkumar Devanathan (twitter: @rdevanathan) works in the IOM-Customer Assist Team (CAT) providing technical assistance to HP Software pre-sales and support teams with Operations Management products including vPV, SHO, VISPI. He has experience of more than 12 years in this product line, working in various roles ranging from developer to product architect.
  • Ron Koren is a subject matter expert for BSM / APM, currently in the Demo Solutions Group acting as a Senior Architect. Ron has over fourteen years of technology experience, and a proven track record in providing exceptional customer service. Ron began his career in R&D as a software engineer, and later as a team manager. Ron joined HP software in 2007 as an engineer in the Customer-Oriented R&D team. Prior to joining HP, Ron held a leadership development role at Israel’s largest bank. Ron holds a B.S. in Computer Science from The Interdisciplinary Center, Herzliya Israel.
  • Stefan Bergstein is chief architect for HP’s Operations Management & Systems Monitoring products, which are part HP’s business service management solution. His special research interests include virtualization, cloud and software as a service.
  • With 11 plus years of very broad experience as a deployment expert for all the NMC products, my deliverables includes helping the Sales and Pre-Sales team in sizing and architecting the solution and hardware, assisting the implementers in product deployment and helping the customers directly when the products are deployed in production setup. As part of Customer Assist Team, I participate in a lot of customer facing activities from R&D side and provides best practices of using HP SW NMC products for efficient network management and leverage my rich experience in Network Node Manager and related iSPIs products.
HP Blog

HP Software Solutions Blog

Featured


Follow Us
Labels
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.