06-06-2014 08:07 AM
CMU version 7.2
SL4540 Gen8 running RHEL 6.4 x86_64
When trying to backup one of the SL nodes, the system successfully reboots, pxe boots, and then goes into an endless loop of various Call Trace outputs and USB disconnect/uhci_hcd events (see attached screenshot)
The backup job eventually times out and fails, but the SL node continues in this endless loop
I have blacklisted ahci as suggested in the user guide for the B120i controller.
I have also blacklisted hpsa to prevent the P420i controller from loading
When kicking off the backup job, I select sda partition 3...this is where the / partition resides
Has anyone run into an issue like this?
Solved! Go to Solution.
06-07-2014 11:42 AM
Is this a internal cluster or customer cluster ?
If it is a customer cluster, please raise support call at local hp support center.
Is this node has Mellanox cards ? If yes, what is the firmware version ?
I couldn't find the screen shot attached. Can you please attach it again, when you see the stack traces.
06-09-2014 07:43 AM
Please find the patch (PATCH-CMU_7.2.1-X86_64-0002) on hpsc site. This patch fixes the CMU netboot kernel crashes seen on servers with Mellanox NICs.
Patch management -> Find patches by product -> HP Insight Cluster Management Utility -> Insight Cluster Management Utility V7.2 -> PATCH-CMU_7.2.1-X86_64-0002.
Let us know how it goes.
06-09-2014 08:31 AM
This is an internal testing cluster...
Yes it does have Mellanox cards...
[root@SL4540-01 ~]# ethtool -i eth0
version: 2.1.6 (Aug 27 2013)
But I am booting off of the 1Gb onboard NIC
I've attached the screenshot again