08-08-2013 07:37 AM
Wondering if anyone else has come across this little situation.
We have an Oracle/Sun Unix server that had a hardware panic upon which it rebooted itself in to maintenance mode. This reboot occurred fast enough that the SiteScope ping monitor did not pick up any disruption in network connectivity.
However, we have additional SiteScope monitors that we figure should have picked the reboot up. While most (if not all) require an SSH connection to respond to the SiteScope poll, these monitors are not being serviced as such a server that is in maintenance mode can only be logged in from "console". The irony is that we make all of these other monitors dependant on a monitor for inetd (anyone catching on to the problem yet?). Even the inetd monitor is not tripping to an error condition.
We went several hours before it was end users who logged calls with our Service Desk before the hardware problem was identified. Not a very good endorsement of our monitoring capabilities with SiteScope. Anyone have any thoughts on what/how we could pick this up?
As an aside, we are intending to add OM Agent to these type of tier 1 critical servers, and hope this will pick it up better.