09-30-2010 11:43 PM
Have a 6 (blades)node polyserve cluster set up with the following configuration.
HP Proliant BL460c G6
OS- Win 2k3 Server ent x64 Sp2
Nic: Hp NC532i Dual Port 10gbe multi funct BL-c adapter.
Polyserve Matrix Server 3.6.1
Build version 3.6.1.0574
installed Solution packs:
HP EVA san Array as storage
Sybase Open CLient 12.5.1
For couple of weeks, observed two scenairos/issues
1)that the nodes get rebooted (may be fenced) or so for whatever reason-event logs (system/app/matrix) dont say much , except for the matrix server terminated messg
2) while the restarted node comes back online, for some reason, all the other nodes in the cluster go into a hung state, the failover doesnt happen till this node comes back alive totally.
The rebooted node does take considerable amount of time to come back to a fully operational state.Once thats done, the other nodes are ok too.
Anyone here come across such a situtation?
Appreciate any help/suggestions :)
10-01-2010 11:42 AM
I'll poke through my notes and see what I can find.
10-01-2010 03:35 PM
Once the node is fenced the cluster should be ok and the node should just reboot and join the cluster but the cluster should be ok without that fenced node.
It sounds like the node gets fenced but for some reason the cluster doesnt think it fenced the node.
Check your ILO fencing settings to make sure the credentials are correct.
10-03-2010 11:15 PM
Thanks for you responses.
I verified the NIC drivers to be of the following version .
Aware of this version causing issues?
10-04-2010 04:00 AM
If this was my cluster, I would contact HP Support and have them review the data from HPS reports (our data, log collector) from each node to review. Otherwise we are just guessing. Yes, blades take a long time to reboot, however if the node is fenced, it will NOT affect the other nodes.
1) Is the underlying HW firmware current or one rev back from all your blades and enclosure. Includes HBA, Broadcom, ProCurve switches, etc? Recommend HP Firmware Maintenance CD and other firmware requirements be close to current for BL460c G6 at drivers section at www.hp.com
2) Are your HP drivers current or one rev back for PSP. Recommend you download current PSP from www.hp.com, Support and Drivers, BL460c G6.
3) Is the underlying OS current for this EOL Microsoft OS?
4) Are you running current PolyServe hotfixes? 3.6.1 has a few based upon your use of the product. www.hp.com, Support and Drivers, PolyServe (case sensitive).
caveot - before any updates to any nodes
a) Mgt console - right click - pause node
b) CMD line - net stop matrixserver
Update the node, reboot, test, next node...
10-11-2010 06:04 AM