12-11-2013 07:01 AM
We are running the 7.2.0 distribution with the following hotfixes on the server:
HOTFIX72_001.jar Tuesday, 5/7/2013, 2:54 PM EDT HOTFIX72_002.jar Tuesday, 5/7/2013, 2:55 PM EDT HOTFIX72_003.jar Tuesday, 5/7/2013, 2:57 PM EDT HOTFIX72_004.jar Wednesday, 5/8/2013, 2:37 PM EDT HOTFIX72_005.jar Tuesday, 5/7/2013, 3:01 PM EDT HOTFIX72_006.jar Tuesday, 5/7/2013, 3:02 PM EDT HOTFIX72_007.jar Tuesday, 5/7/2013, 3:06 PM EDT HOTFIX72_008.jar Tuesday, 5/7/2013, 3:12 PM EDT HOTFIX72_009.jarFriday, 4/26/2013, 9:36 AM EDT
Starting earlier this week, a number of false positives of systems being unreachable started to occur every morning. According the Insight Manager logs, the systems in question would be unavailable at 2:30 in the morning, and then become available
five minutes later.
I checked the uptime of the systems in question, and none of them indicated a reboot had occurred during this past week. I checked the facilities staff to see if any work had been done on the network infrastructure, and they reported
that no work was being done at the time indicated.
I checked the log files on the client systems, and there were no apparent error messages.
The client systems are all running the CentOS 5.3 64-bit distribution. Has anyone an idea as to why the false positives are occurring, and how they can be corrected?
12-11-2013 01:02 PM
What is the exact event ? Can it be that the agents on the servers crash and restart. So not the server itself but the agent. Do you get these events from all of your servers or just a few ?
12-12-2013 05:27 AM
Hi Andrew --
Thank-you for your reply, and my apologies for not responding sooner. The same event occurred this morning. I checked the servers that reported the problem, and while it is a majority of the systems, it is not all of them. I checked the cma.log file on several of the systems, and none of them had entries from the past week indicating the agents had crashed at that time. The same can also be said for the hpasmd.log file.
Are there any other log files that I can check for entries indiciating an agent crash? If not, how can I determine if the agent did crash on the systems?
12-12-2013 11:21 AM
Did the error occur on all of the servers at the sames time? Or are they all on the same subnet or software version. As you can read i'm looking for some common cause. Can you attach a screenshot of the events you get for one example server?
12-13-2013 03:55 AM
Hi Andrew --
The error did occur on all of the servers at the same time, and they are all on the same subnet. I went through my e-mail this morning, and there were no e-mails indicating the condition that I reported in my original posting. I will keep an eye on the systems, and if and when the next event occurs, I will post it here.
12-13-2013 02:10 PM
same subnet, same time sounds like something is happening to that segment, may want to consider any Layer-2/3 devices in that path possibly being the culprit. Typically false positives are random occurances if they happen.