Polyserve Cluster unexpected reboots! (703 Views)
Reply
Occasional Visitor
reachkrishna
Posts: 2
Registered: ‎09-30-2010
Message 1 of 6 (703 Views)

Polyserve Cluster unexpected reboots!

Hi All,
Have a 6 (blades)node polyserve cluster set up with the following configuration.

HP Proliant BL460c G6
OS- Win 2k3 Server ent x64 Sp2
Nic: Hp NC532i Dual Port 10gbe multi funct BL-c adapter.
Polyserve Matrix Server 3.6.1
Build version 3.6.1.0574
installed Solution packs:
SSAS3.6.2.0202
SQL2000 3.6.2.0202
MSDTC 3.6.2.0202

HP EVA san Array as storage
Sybase Open CLient 12.5.1

For couple of weeks, observed two scenairos/issues
1)that the nodes get rebooted (may be fenced) or so for whatever reason-event logs (system/app/matrix) dont say much , except for the matrix server terminated messg
2) while the restarted node comes back online, for some reason, all the other nodes in the cluster go into a hung state, the failover doesnt happen till this node comes back alive totally.
The rebooted node does take considerable amount of time to come back to a fully operational state.Once thats done, the other nodes are ok too.

Anyone here come across such a situtation?
Appreciate any help/suggestions :)

thanks,
krishna

Please use plain text.
Occasional Advisor
BlackHawkEH!
Posts: 8
Registered: ‎02-26-2009
Message 2 of 6 (703 Views)

Re: Polyserve Cluster unexpected reboots!

We had something similar with Fencing and it came down to NIC settings and binding orders for the most part.

I'll poke through my notes and see what I can find.
Binary: easy as 1, 10, 11
Please use plain text.
Honored Contributor
Emil Velez
Posts: 1,450
Registered: ‎05-17-2000
Message 3 of 6 (703 Views)

Re: Polyserve Cluster unexpected reboots!

Check what version of the broadcom driver you have. THere are a few known issues with that.

Once the node is fenced the cluster should be ok and the node should just reboot and join the cluster but the cluster should be ok without that fenced node.

It sounds like the node gets fenced but for some reason the cluster doesnt think it fenced the node.

Check your ILO fencing settings to make sure the credentials are correct.



Please use plain text.
Occasional Visitor
reachkrishna
Posts: 2
Registered: ‎09-30-2010
Message 4 of 6 (703 Views)

Re: Polyserve Cluster unexpected reboots!

Hi Guys,
Thanks for you responses.

Emil,
I verified the NIC drivers to be of the following version .
BroadCom Corp
5.0.13.0

Aware of this version causing issues?

regards,
Krishna
Please use plain text.
Advisor
Dan Tyndale
Posts: 22
Registered: ‎10-20-2009
Message 5 of 6 (703 Views)

Re: Polyserve Cluster unexpected reboots!

If the node is fenced and if iLO fencing (vs. SAN fencing), then it will be rebooted. That is the expected behavior to protect the integrity of the data. What is not expected is the hang of every other node.

If this was my cluster, I would contact HP Support and have them review the data from HPS reports (our data, log collector) from each node to review. Otherwise we are just guessing. Yes, blades take a long time to reboot, however if the node is fenced, it will NOT affect the other nodes.

Some guesses:
1) Is the underlying HW firmware current or one rev back from all your blades and enclosure. Includes HBA, Broadcom, ProCurve switches, etc? Recommend HP Firmware Maintenance CD and other firmware requirements be close to current for BL460c G6 at drivers section at www.hp.com

2) Are your HP drivers current or one rev back for PSP. Recommend you download current PSP from www.hp.com, Support and Drivers, BL460c G6.

3) Is the underlying OS current for this EOL Microsoft OS?
http://support.microsoft.com/?id=935640

4) Are you running current PolyServe hotfixes? 3.6.1 has a few based upon your use of the product. www.hp.com, Support and Drivers, PolyServe (case sensitive).

caveot - before any updates to any nodes
a) Mgt console - right click - pause node
b) CMD line - net stop matrixserver
Update the node, reboot, test, next node...
Please use plain text.
Advisor
Tammy Lawson
Posts: 23
Registered: ‎06-26-2009
Message 6 of 6 (703 Views)

Re: Polyserve Cluster unexpected reboots!

I had a similar issue too on my 3.6.1 cluster. I verified NIC settings, flow control, etc per the documentation and HP support. What it ended up being was a self-inflicted problem...before we moved to Polyserve we had implemented a procedure to gzip our SQL backups via a DOS script every night (we get charged per Gig). Anywho - on Polyserve it would fence occassionally when the DOS script tried to compress a file or LUN that was currently in use. Not sure of the specifics there - just the DOS script was the common denominator. On a side note, we now use Litespeed for compression and have no issues.
Please use plain text.
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation