Re: [Issue] NNMi Alerted only one device when entire site went down (413 Views)
Reply
Advisor
Rahul_Nayak
Posts: 30
Registered: ‎07-18-2011
Message 1 of 7 (436 Views)
Accepted Solution

[Issue] NNMi Alerted only one device when entire site went down

Hi All,

 

We have NNMi 9.1 Installed in our environment- Windows 2008. We had a recent power outage for a region with routers and multiple switches. When the site went down, only one device [Secondary Router] was alerted in the NNMi Incident view when the entire site was down for sometime. Ideally, the entire region devices should be alerted as down right? 

 

Can anyone pls provide solution..

 

Thanks 

Valued Contributor
pafreire
Posts: 140
Registered: ‎01-10-2011
Message 2 of 7 (425 Views)

Re: [Issue] NNMi Alerted only one device when entire site went down

Hello Rahul,

NNMi should create an incident for each device that is down.

A new incident is only created after device be polling (default 05 minutes) and after dampening (default 06 minutes).  

If you have an outage for a short time, it couldn't  be sufficient to create a new incident.

If this is your case, you can change the dampening value to detect more quickly node down incidents.


HTH,

Paulo

“The greatest challenge to any thinker is stating the problem in a way that will allow a solution.”
Bertrand Russell
Honored Contributor
AndyKemp
Posts: 751
Registered: ‎05-17-2010
Message 3 of 7 (421 Views)

Re: [Issue] NNMi Alerted only one device when entire site went down

Incorrect assumption its working as designed.  Root cause analysis determined that the single device that was alerted on was the cause of the remaing devices to become isolated. The isolated devices would have turned blue as unknown.

 

 

Unless you put devices into the "Important nodes" node group, topological connectivity is use to determine the status of every node.

Have a nice day :)

Andy Kemp,  CISSP
Advisor
Rahul_Nayak
Posts: 30
Registered: ‎07-18-2011
Message 4 of 7 (415 Views)

Re: [Issue] NNMi Alerted only one device when entire site went down

Thanks for the explaination Andy,

Even I thought it may be due to RCA done on the network and as you had mentioned, all the other devices were in Unknown status.

But the issue was not due to the Router being down, it was an entire region power failure. So it may not be that only that router lost its power supply and the other devices went down. If the entire power supply goes down, all devices must alert right??

And please explain a bit about Important Node Group.
Honored Contributor
AndyKemp
Posts: 751
Registered: ‎05-17-2010
Message 5 of 7 (413 Views)

Re: [Issue] NNMi Alerted only one device when entire site went down

[ Edited ]

 

 

 

When it uses topology to determine isolation or not its because its identified how devices are connected via l2 and l3 sources . It keeps track of this and updates its model every time discovery oer spiral discovery collects information... essentially it matches information available in table form on each device with all the other tables from all the other devices.

 

Logically whent he power went out and all of those devices became unreachable NNMi was telling you that it knew something was wrong and that the most likely point to investigate was the last logical hop it could reach, it does not make the assumption that the nodes past it are down because it cannot poll them directly.

 

 

The Important nodes group side steps this logic, if any node in that group becomes unreachable it will alarm as node down.

Have a nice day :)

Andy Kemp,  CISSP
Valued Contributor
pafreire
Posts: 140
Registered: ‎01-10-2011
Message 6 of 7 (386 Views)

Re: [Issue] NNMi Alerted only one device when entire site went down

Hello Rahul,

Let me clarify something here.

Is it this router, that alerted, that connects all devices in this region?


BR,

Paulo
“The greatest challenge to any thinker is stating the problem in a way that will allow a solution.”
Bertrand Russell
Advisor
Rahul_Nayak
Posts: 30
Registered: ‎07-18-2011
Message 7 of 7 (374 Views)

Re: [Issue] NNMi Alerted only one device when entire site went down

Thanks a lot for the explaination AndyKemp. 

 

Let me rephrase, NNMi checks the availability of the devices by following the topology and communicating to the next connected devices to get its availabilty. 

 

When the entire site went down, the NNMi checked the region availability by discovering through the primary router in the region. When it found that it was down, it was unable to check the other devices connected to it and it turned unknown status. Thats is the reason the only one device alerted and others didnt.

 

@pafreire: Yes it was the Router that connects to all devices in that region. 

The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.