02-11-2013 06:18 AM
This past Sunday we scheduled a maintenance window to reboot a switch in one of our branch offices. The reboot was motivated by the necessity to enable QOS queues. As we needed to reboot, I also opted to upgrade the software. I decided to do leverage the two management cards in an effort to reduce downtime. I elected to do this because our HQ has the same switch, but passes traffic 24/7. Thus any downtime is a big issue. The article I followed for instructions is pasted below:
The switch was running K.15.07.0008 and I upgraded to K.15.09.0012
The boot ROM is K.15.28
The secondary management card rebooted just fine with the new image. When I switched over to it from being standby to active, we ran into a failure. The switch was no longer able to properly pass traffic. My ping times were between 700MS to 900MS over our VPN (typically they are about 90MS - the VPN is not handle by this switch but by another device). Any users in the branch office had limited to no access to the public Internet or our internal network.
In order to solve the problem we had to completely reboot the switch.
I was wondering if anyone else had any theories as to why this would happen? The article I followed is older, but the only one I could dig up - is the process deprecated? Are there any known bugs regarding HP and dual-management cards? How does one properly leverage two management cards in an attempt to minimize downtime should a reboot be required?
Thank-you kindly for anyone who has taken the time to read and/or reply to my post.
02-12-2013 06:29 AM
Sounds like maybe it got stuck syncing between the two MMs after update. I've noticed high latency when the MMs are going through their initial syncs. How long was it before you rebooted the entire switch and did you check and see if it was in the process of syncing the MMs with "show redundancy" command?
02-12-2013 12:54 PM
Thank-you for the reply,
I agree that you'll see high latency while things "settle-down". Unfortunately in this situation that was not the case. MM2 was the active mgmt module and MM1 was in stand-by (and vice-versa during the second attempt). During the first attempt there was about 30 - 40 minutes delay between the switchover and the hard reboot. During the second attempt, there was about 20 - 30 minutes delay.
I have sent support a show tech all. I rcvd a Lync IM from the support rep, but no replies yet.
What I'm afraid of now is that all the redundant mgmt cards we have in the various 8212zl switches we have deployed, don't in fact function the way they should.......
02-12-2013 01:43 PM
02-12-2013 03:01 PM - edited 02-12-2013 03:03 PM
I'm really interested to find out what happens as I am troubleshooting an issue with our 8212. Different symptoms and failure scenario, but still tied to MMs and in our case the fabric cards. I actually found the issue while testing the switch failure scenarios, so glad I found it while testing in a maintenance window and not when a failure actually happened. Still waiting on a response from support.