Providing just enough meat for (service) models!!!

Have you ever seen large service models of computers showing 32 CPUs or so? Does having 32 CPU elements attached to the server in a service model make sense from a business service monitoring standpoint?

 server_with_32_cpu

Thinking about it, it makes sense to have the inventory of the CPUs (or cores) from the asset management standpoint. From the service monitoring standpoint however the individual inventory of the CPUs is not required. With multi-tasking being the norm and processes easily moving from one processor to another, one CPU or a few CPUs undergoing a spike does not affect the working of the applications. Modern operating systems automatically balance load amongst processors optimally.

 

So it boils down to this - would the operations bridge team be interested in monitoring each CPU core? Is there a situation where an app is affected by one or a few CPUs having high utilization or maybe electrical faults?

 

The service model is required to show that a business service is performing optimally and is able to handle the number of transactions that the SLA states. If there's any shortfall from the SLAs, the root-cause may come down to the network, the app or the system. That level of abstraction is sufficient. Coupled with the health indicators or some such state representation labels for the element in the service model, a system could be shown to have a CPU or memory bottleneck or even some other 'system-fault' state altogether.

 

If I may dare generalize this - the rule is clearly 'Present only what is relevant to the service model'

 

NOTE: With CPU affinity settings, it is possible for instance to restrict certain apps/processes/virtual machines to just a few cores. In such cases, it becomes 'relevant' to present these cores to the service model.

 

On the same basis you would not want to model the memory elements and also stuff that is so intrinsic or individual to the operating system like maybe the process table, kernel, etc. These are really abstracted to the system itself.

 

The rule above might lead you to ask - should we show individual NICs and storage volumes on the system?

 

It might be prudent here to note that NICs and storage disks/volumes are unlike CPU or memory from a resource usage and so too the modelling standpoint. Here's the low-down.

 

For most applications the disk is the fundamental unit of storage and the bonding between the app and the storage is quite high. There's no way on local storage that an app will start writing its data to another volume just because there's no space on the disk. So it is important that we model each disk, each NIC complete with the IP address (and MAC address) as part of the computer system model.

 

IO problems occuring on one disk or one network card can cause applications to fail entirely. So it is important to present the potential root-causes (here that one lousy disk or NIC) in the service model.

 

While on this topic, we must also discuss some special cases such as teaming and aggregation.

 

It is common nowadays to do something called NIC teaming (a.k.a link aggregation). Teaming combines one or more NIC cards together in a single interface name, so the end result is instead of having four 1gb NICs the system appears to have one 4gb NIC thereby getting greater bandwidth.

 

Again, I suggest here to keep the abstraction at the right level - we must show the 'teamed' NIC interface first and foremost in the computer system model. Unless there are really compelling business reasons, we should not attempt to show the drill-down of the bonded NICs, for the service model.

 

For the same reason you would not show that the data is striped across multiple disks in a RAID configuration (and/or try to show associations to each disk/volume) - again, not relevant for the business service models that enterprises typically like to use in their bridge.

 

That leads to the interesting question - who is going to monitor the lower elements below the abstraction layer? This is where the element managers (HP SIM and OneView, Cisco UCS Manager)  and domain managers (HP NNMi, HP OM, HP Storage Essentials) come into the picture. Here's a picture just to clarify.

 

ops_bridge_and_domain_manager

 

Must be noted here that the operations bridge would be served events, 'enough' topology and performance data from the element managers also, to allow for advanced event correlation reaching upto business service layers and other interesting BSM use-cases.

 

Then there are other cases such as clustering with failover and load-balanced configurations. In these cases, it is important to show the health of the cluster as affecting the clustered application, rather than the health of the nodes in the cluster affecting the app.  It is quite possible (thanks to the whole redundancy plan behind clustering), that nodes are not 'healthy' but the cluster is still quite healthy and so the apps running in the cluster are performing optimally too. Use propagation and calculation rules when dealing with aggregated elements such as cluster systems to ensure that the health state of the clustered nodes is passed up to the cluster level. The idea here is to not call out a fault only because one node malfunctions. Maybe that's a warning alert 'Redundancy affected. Node X down'. But definitely nothing to call out a critical alert as it would be raised in a similar situation without any clustering implemented.

 

When talking of system models, one cannot escape the stark reality - what you see is not what you really get :) Virtualization is a great leveler. It allows you to think that you have a disk for reads and writes, but what you really have is a file to which all reads and writes are happening. It allows you to think you have CPUs and memory slots, but again these are only 'threads' in the OS kernel and regions in memory.

 

So how really should one model virtual systems? I will cover this part in my next blog article.

 

Links

 

HP BSM 9 video

HP BSM Software home page

BSM 9.22 Best Practices: RSTM (pdf)

HP OMi

HP OM

HP NNMi

HP Storage Essentials

HP vPV

 

 

To be continued

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the Community Guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
Ramkumar Devanathan (twitter: @rdevanathan) works in the IOM-Customer Assist Team (CAT) providing technical assistance to HP Software pre-sa...
Featured


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.