2 tips and tricks for effective system performance analysis

Guest post by Ramakrishna Baipadithaya Kenchabhatre and Sunil Lingappa, HP Operations Agent R&D Leads 

 

Andrew is a System Administrator at a large IT enterprise company. He uses HP GlancePlus, HP Operations Agent and HP Operations Manager in his environment to monitor and troubleshoot system performance issues. However he has two tricky problems that haunt him once in a while.

 

Problem 1:

 

Andrew comes to office on a fine morning and sees that his HP Operations Manager message browser is full of alarms generated from two systems in his environment. He is concerned and gets down to the issue at hand. He uses his debugging tools and figures out that these two systems are critical database systems in the infrastructure. He is able to figure out that these alarms are generated between 2 a.m. and 4 a.m., and he begins his investigation here.

 

Andrew opens the alarmdef and sees that he has alarm definitions which raise an alert on the threshold of cpu, disk resources crossing 70 percent utilization. This is an expected behavior. He is now wondering where the gap is. Andrew now figures out these systems are used for business database transactions during the day between 9 a.m. and 3p.m. everyday. These systems further back up their databases on a backend server between 2 a.m. and 4 a.m.

 

Problem: The need for ‘Intelligent Alarming’ wherein alarms can be masked at customizable times of the day.

 

We can define ‘shifts’ in alarm definitions to define the alarm thresholds. Let’s see an example of how this can be achieved.

 

Sample alarm definition:

BSM Alarm 1.png

 

In this alarm definition, I have defined a start_shift, end_shift time of the day parameters. I have an alarm for CPU bottleneck defined taking in to account these shifts. The alarms get generated only i :

-          The alarm threshold  condition is met and

-          The time of the day is between 04:20 and 04:23 only

 

 

 

 

 Message Browser window of HP Operations Manager:

BSM Alram 2.png

 

As you can see from the output of the Message Browser of HP Operations Manager window, alarms are generated only between the customized times ( start_shift, end_shift) only. The message with severity “normal” indicates the shift window processing is complete. In this way Andrew is able to use a common alarm definition to intelligently monitor his environment efficiently.

 

Problem 2:

 

Andrew has a system in his IT infrastructure where constant upgrades and patches are applied for a DBserver Application on a regular basis. He wants to monitor these application upgrades in terms of resource utilization and in particularly the memory growth, because the system is just sufficient on memory. He wants to proactively detect a potential memory leak and take corrective action like downgrading the patch or any suitable action if the need arises.

 

Andrew then spends the next few hours racking his brain thinking of an easy and automated way to do this.

 

Problem : Proactively detect memory leak of an Application

 

Andrew uses the alarmdef file that comes with HP Operations Agent software to achieve his purpose. He sets up a customized alarm definition as below.

 

Customized Alarmdef:

 BSM Alarm 3.png

 

He has defined a application in the parm file for all processes that attribute towards the DBServer Application. He has set a threshold of 100 MB for the memory growth limit. APP_MEM_VIRT gives the virtual memory growth of the defined application over time.

 

Unfortunately the alert messages keep coming to Andrew’s inbox every five minutes. That’s a lot of messages do deal with over night or during the weekend. In order to reduce the frequency of these messages in his inbox, he uses the repeat keywordin the alarm definition. This interval is configurable and is like a ‘snooze’ timer. The alert is validated only if the condition is true after the repeat interval and then an email is sent to the administrator indicating the issues still exists.

 

Thus Andrew is able to effectively monitor the DBServer application proactively and take necssary actions so that his entire IT system does not go down due to a memory crunch on the system.

 

The alarm definitions need not be edited on individual systems. You can change the alarmdef policy on the HP Operations Manager and deploy it on to all interested nodes

 

This is the third part of a three-part series. I encourage you to read part 1 and part 2 here

 

Comments
| ‎08-06-2013 05:48 PM

Editing alarm.def files on individual servers is not easy, intuitive, or scalable. Wouldn't it be better so set shift windows and alarm thresholds in Operations Manager directly, rather than by editing files on individual servers? Wouldn't it be easier to just define a process to watch, and thresholds to alarm on using the Operations Manager policies?

 

Similarly with editing alarm.def to send email to individual users - this is problematic to manage across multiple systems, and is better handled at an Operations Manager level (probably usning the xMatters integration).

HP Expert | ‎08-06-2013 09:19 PM

Ramki, nice tips for using alarmdef/adviser file effectively. Keep giving us expert tips like this.

 

It would be helpful if the alarmdef syntax above is copy-able (like a code snippet). Useful for copy-paste freaks - a large population among us. :)

 

 

Here's a nice gotcha to turn off alerting from perfalarm component entirely.

 

# agsysdb -ovo off

 

Also have a look at the other options for this command agsysdb (located in the same folder as the other perf binaries - /opt/perf/bin or %OvInstallDir%\bin.

 

I use this when i have deployed OM policies to do the system monitoring and so i don't want direct alerts from perfalarm.

Guest Blogger (HPSW-Guest) | ‎08-08-2013 09:37 AM

Hello Lindsay,

 

Thank you for the feedback. We agree with you that editing individual alarmdef files is not a scalable solution.

We have updated the blog content suitably. Please note the addition of following lines at the end:

The alarm definitions need not be edited on individual systems. You can change the alarmdef policy on the HP Operations Manager and deploy it on to all interested nodes

 

--------------------------------------------------

 

Thanks and regards,

Ramki

Field Service Program | ‎08-14-2013 11:26 PM

The tips and tricks discussed above by Ramakrishna Baipadithaya Kenchabhatre and Sunil Lingappa, are effective enough. Editing individual alarmdef files is not a scalable solution is a point one must always be known to.

HP Expert ‎08-20-2013 04:08 AM - edited ‎08-20-2013 04:09 AM

Hello Ram,

 

Thanks for the feedback. Here are the sample alarmdefs for both scenarios:

1. Shift based alarming


start_shift = "08:00"
end_shift = "17:00"

symptom CPU_Bottleneck type=CPU
rule GBL_CPU_TOTAL_UTIL > 75 prob 25
rule GBL_CPU_TOTAL_UTIL > 85 prob 25
rule GBL_CPU_TOTAL_UTIL > 90 prob 25
rule GBL_RUN_QUEUE > 2 prob 25

ALARM CPU_Bottleneck > 80 AND GBL_STATTIME > start_shift AND GBL_STATTIME < end_shift for 10 minutes
type = "CPU"
start
if CPU_Bottleneck > 90 then
red alert "CPU Bottleneck probability= ", CPU_Bottleneck, "%"
else
yellow alert "CPU Bottleneck probability= ", CPU_Bottleneck, "%"
repeat every 10 minutes
if CPU_Bottleneck > 90 then
red alert "CPU Bottleneck probability= ", CPU_Bottleneck, "%"
else
yellow alert "CPU Bottleneck probability= ", CPU_Bottleneck, "%"
end
reset alert "End of CPU Bottleneck Alert"



2. Detecing memory leaks:

 

# Watch for DBServer application using over 100MB memory VSS

VSSthreshold = 100000

alarm DBServer:APP_MEM_VIRT > VSSthreshold for 5 minutes
start {
yellow alert "DBServer app memory threshold exceeded"
exec "echo 'DBServer app memory alert' | mail root@adminbox"
}
repeat every 60 minutes {
yellow alert "DBServer application still hogging memory"
exec "echo 'DBServer app alert continuing' | mail root@adminbox"

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the Community Guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
This account is for guest bloggers. The blog post will identify the blogger.
Featured


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.