Enterprise Service Management – Architecting a service foundation

In my last blog I touched upon the five stages for building a robust Enterprise Service Management platform (ESM) - foundation, standardization, optimization, intelligence and excellence. Here I will show how an ESM platform can be architected using HP's suite of products. This will be a series of three blogs starting with ESM Service foundation in this blog. More information about HP Enterprise software portfolio such as Business Service management or Automation and Cloud can be found here.

 

What prompted me to write this blog is my recent meetings with senior IT managers at large multinational companies. These companies are in the media, pharmaceutical and electronic design automation industries. The discussions with all three were quite similar. They all wanted to improve two areas of IT - IT service quality and productivity. At the media and  electronic design automation companies, whenever they had service outage their IT operations team had too many management tools to look into. Not only it took time to bring back service, there was lot of effort went into root cause analysis. The pharmaceutical company had a different approach to address service quality and optimize operational cost. They wanted to insource ESM platform engineering for better governance and outsource only the operations.

 

In fact, these are established companies that have an IT management strategy in place, have purchased commercial IT management tools, have changed the IT management tool vendors at least twice, and have also outsourced the IT management function. However, they had reached a threshold point which required them to revisit their current ESM platform because the service quality was no longer optimal (some might even say it was no longer tolerable). Secondly, there was always pressure for IT to improve productivity while addressing new and disruptive IT technologies such as cloud, mobility, virtualization etc.

 

This is where my conversation on ESM stages helped them to prioritize and arrive at a roadmap for building a long-term ESM blueprint. Sometimes I used an analogy between EMS stages with the SEI CMM Capability Maturity Model – Initial, Repeatable, Defined, Managed and Optimizing. Or with “Business Service Management” maturity levels – Reactive, Applied, Managed, Proactive and Predictive operations described in the HP BSM solution discovery brief. My objective was to ensure clarity in solution roadmap along with business benefits that ESM stages brings.

 

 “Service foundation” has three functional components:

 

  1. IT infrastructure fault monitoring component
  2. Configuration management DB component
  3. Service Desk with Incident management

 

These are fundamental and most enterprise has these functional components in some form or another. That’s because IT infrastructure and applications vendors normally supply management tools, which also have incident management capabilities. If that is the reality, what is the necessity to architect an ESM platform? Well, there are many reasons an ESM can be beneficial. As the IT environment changes with hardware and software refresh cycles (grown organically or inorganically with acquisitions/mergers) so does the heterogeneity of devices bought from multiple vendors and an equal number of infrastructure element management tools. In fact, one of the above mentioned MNC customers realized during an IT audit that they had nearly 100+ IT infrastructure elements were not even under monitoring and management control. The implication of this is chaotic operations and poor IT service quality.

 

The following figure illustrates the service foundation functional components and corresponding HP products that support these functions.

 

        Service Desk Diagram.jpg               

 

IT Infrastructure fault Monitoring

 

First step in building ESM foundation is to bring Infrastructure Fault Monitoring Capability. When a network, server or a storage element goes down, the IT operator is immediately notified. Where possible, I would also try to consolidate the fault monitoring to one tool for each domain - Network, Computing and Storage.

 

There is also the fourth domain 'Cloud', without which everything is considered incomplete, including this blog. In service foundation stage however, I would just have incident ticket integration at service desk layer.

 

Network Management: HP Network Node Manager (NNMi) - A comprehensive network management solution (watch the introductory video here) with almost three decades of best practices and patents embedded to effectively manage all-size IP networks. It is simple to use - by configuring the discovery seed with IP address range and SNMP community string, NNMi discovers:

 

  • The network
  • Sub-network
  • Network devices
  • Network links
  • Servers
  • Printers
  • Any IP v4 or v6 addressable devices connected on the network

As soon as a network element is discovered, NNMi automatically starts monitoring them. The intelligence is in its algorithms that understands IP network topology to quickly detect link failures and device outages, without overloading the network with management traffic.

 

For network administrators, it hides the complexity of managing the event storm, filtering and de-duplication of events, event auto-correlation, intelligent polling (both ICMP and SNMP), overlapping address domain, spiral discovery, discovery of device types/vendors, L2/L3 paths, VLANs, MPLS VPNs, HSRP and the intelligence required to discover and manage "Software Defined Network".

 

Apart from managing many enterprise networks for large banks, retail, airlines and so on, NNMi is used by large communication service providers to manage IP backbone (Multi-Protocol Label Switch and Carrier Ethernet Network) that with than 50,000 network devices. It easily scales to handle more than a million interfaces and 50 traps/second as required by the service provider environment. NNMi is also used by VoIP, Remote Infrastructure Management and Wi-Fi service providers to manage multi-customer environment from a single Network Operations Center. Well, having worked on NNMi product as an engineer nearly two decades back and deploying it for many years, I could not resist talking about it.

 

Considering that the network is the foundation of IT service, its availability and performance has a direct impact on the business process (revenue) and employee productivity. Thus having a good network management platform (even if the network management is outsourced) is very important and business critical.

 

NNMi's advanced capability goes beyond fault management, necessary for building additional functionality required for later stages of ESM architecture. iSPI and Network Automation are additional components when integrated with NNMi, manage network configuration, capacity, performance, and service level and availability management.

 

Server Management: While most of the network elements are SNMP enabled, servers are not always SNMP enabled for management from products such as NNMi. HP Sitescope is an 'Agentless' server and application monitoring tool (Watch the introductory video here). It uses remote command execution to monitor more than 100 different target types on both physical and virtual systems, for critical health and performance characteristics.  Sitescope monitors system resources such as memory, disk space and CPU utilization, to assure the availability and performance of systems and applications running on it. Sitescope can also monitor Syslog, Eventlog and application log files for error string-patterns that indicate problem situations. Sitescope comes with a library of fault monitors for standard applications like Microsoft exchange, LDAP, Oracle, MsSQL and so on. Sitescope offers a fastest way to start fault and performance monitoring of servers.

 

How to manage mission critical servers where remote command execution is not possible? HP Operations Manager (HP OM) has agent-based management capability. An agent with a small footprint is installed on the managed server. The benefit of agent-based solution is network traffic for management is significantly reduced, improves security as remote command execution is disabled,  duplicate and irrelevant events are correlated, suppressed and event-actions are executed locally on the managed server. Further, agent maintains a cache of events during network outage and forwards them when connectivity with management server is re-established. This agent-based approach allows sophisticated capability to be developed for managing complex applications environment. The Operations Manager Datasheet captures the product capability in detail. Here is a good blog from my HP colleague on Agent Vs Agentless monitoring.

 

Large enterprises also have hardware management solutions such as HP System Insight Manager, Dell Openmanage, Fijitsu Serverview, Cisco UCS and so on. As these management solutions typically support SNMP trap interface, I would directly integrate them into Operations Manager or Sitescope. Both HP Sitescope and HP Operations go beyond fault monitoring of servers and into application management capability. This allows expansion of ESM functionality without having to introduce additional tools.

 

Storage management: HP Storage Essentials (HP SE) is a centralized heterogeneous storage capacity and utilization management solution. The need for storage capacity is always increasing and so is the business dependency on information and its speed of access. At the same time Storage technology is evolving rapidly to keep up with the business requirements. Typically, large enterprises have two-three storage vendors just so that enterprises get the best value for money.  The storage vendors typically comes with a management software for configuration, capacity, performance and fault management.

 

From a pure storage fault management point of view, it is possible to integrate (via SNMP) these vendor specific management solution to NNMi, Sitesope or Operations Manager for consolidated event management.

 

When enterprises have a large array of storage from multiple vendorsand there is a business need to provide high-quality storage service with "Application to storage" visibility of all physical and virtual storage assets across SAN, DAS and NAS, HP Storage Essentials is the technology that supports this capability.

 

Configuration Management Database: Having an inventory of all physical and logical infrastructure elements that are commissioned to support the IT service is key to successful IT operations. HP UCMDB (Universal Configuration Management DB) ensures there is a single-source-of-truth for all configuration items. HP UCMDB stores software and IT infrastructure components along with associated relationships and dependencies (topology). With UCMDB it is possible to visualize how components are related to each other—providing a deeper view for the business. UCMDB has lot more capability, which I will touch upon in “Service Optimization” stage of ESM.

 

At the "Service Foundation" stage UCMDB has two main functions to support

  1. Centralized store for all commissioned software and IT infrastructure elements. All changes in the IT environment are maintained in this store.
  2. Source of "Configuration item" information for all other management systems - Sitescope, NNMi, OM and Service Manager.

 

Service Desk: Ability to have a system for users to log an IT incident is a necessity for IT operations to be reactive. HP Service Manager (HP SM) service desk module supports Incident, Problem and Knowledge management – all are required to simplify the IT support process. HP SM allows IT users to be able to raise an incident, assign it to an agent and resolve. HP SM is an ITILv3 compliant solution that can be extended to support additional processes such as change, configuration, and SLA management enabling single point of contact for all IT core processes. I am very much tempted to propose HP’s new Service Catalog product HP Propel. However, most of the time Service Catalog will still be evolving during service foundation stage. Hence it is appropriate to bring it later. Feel free to read this recent HP Propel blog about service catalog and the self-service capability of HP SM that includes integration with HP Autonomy IDOL for search across multiple knowledge sources.

 

Integration: Now that we have got the foundation components in place, let us look at how they need to be integrated with each other. Integration capability is very critical as many a times I find enterprises have already invested in a Service desk system such as BMC Remedy or CA Spectrum for network management. Like HP Software, most of these do support northbound and southbound interface API for integration purpose. Hence, it should not be a major concern.  

 

The following two figures illustrate how these components are integrated. The figure on the left is with HP OM and the figure on the right is when only HP Sitescope is proposed. The numbers in the circle represents the integration points.

 

 HP Service Manager diagram.jpg

 

0. This is the integration of Network, Server and Storage management systems with IT infrastructure elements. This is also an integration point for IT infrastructure vendor-specific element management systems such as CiscoWorks for network, HP Insight Manager or EMC's storage manager.

 

1. Events, traps, alarms from NNMi and Storage manager are integrated into HP Sitescope or HP Operations Manager. There is a slight difference in integration depending on whether HP Operations Manager is used or not.  If HP Operations Manager (HP OM) is present, then HP OM forms the consolidated event management. Otherwise HP NNMi acts as consolidated event management system. Thus the picture on the right has alarms flowing from Sitescope to NNMi.

 

2. This integration allows configuration items discovered by NNMi, Sitescope and Storage Essentials (or storage manager) are kept synchronized with HP UCMDB. This synchronization could be carried out manually at this stage (Service foundation) using a simple excel sheet.

 

3. This integration enables synchronization (via federation) of configuration items between HP SM and UCMDB. UCMDB also supports integration with BMC Remedy (documented here) or CA, which is described here (HP passport id required).

 

4. Finally, integration that allows automatic incident creation in HP Service Manager. It is a best practice to enable policies in NNMi or Operations Manager to ensure that an auto-incident in HP Service Manager is created only for known problems.

 

Note that if an enterprise have a different infrastructure monitoring tools or a different service desk product, the ESM architecture described above for building “Service Foundation” still holds good. It is possible to integrate between the products from different vendors as described above.

 

In my next blog, I will discuss on architecting next level of ESM stages Service Standardization and Optimization.

Labels: software
Comments
iti-ip on ‎07-15-2014 01:16 AM - last edited on ‎07-15-2014 04:47 AM Prasanna_A

it's very interesting blog

 

Thank you

 

SergioPuggelli | ‎08-28-2014 08:53 AM

Great Blog. Is it possible to have an idea (probably it depends by too many factors) about the effort ($ or days) required for the described integration?

Regards.

Sergio.

 

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the Community Guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
Enterprise Architect helping large enterprises and telecom service providers with business aligned IT solutions for over two decades.
Featured


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.