10-15-2012 12:42 PM
i've got a rx3600 (11.31, of course) that always crashes on shutdown -r.
when i was first made aware of the problem, the system would actually get into a crash-fail-to-dump loop that was only resolved by powercycling the system. the powercycle always resolved the issue (albeit unelegantly) and the system would come up as expected. we manually booted (hpux> boot /stand/current/vmunix) over the weekend thinking the system was confused about what kernel to boot from. that only marginally helped since the system still crashes on shutdown but at least now it produces a crashdump (see attached).
to me, this crashdump looks like there's a hardware problem -- but nothing is raising a red flag.
i noticed the (dire :-) warning about USB driver in the crashinfo file. could that be it? the driver is old (C.01.05.08). the HAA boot device is USB:
Primary bootpath : 0/6/0/0/0/0/1/0/0/0.0x0.0x4000000000000000 (/dev/rdisk/disk3)
HA Alternate bootpath : 0/0/2/1.0x0.0x10 (/dev/deviceFileSystem/Usb/MassStorage/dsk/disk@hp
the system has been recently patched with the latest support bundle, fwiw.
Solved! Go to Solution.
10-15-2012 04:59 PM - edited 10-15-2012 05:01 PM
Defintely get that bad USB driver out of there. And unless you have a goiod reason for the USB path in HAA path, use setboot to set HAA to real boot path.
Also, latest patch bundle? That would be Sep 2012 for QPK, Feature and HWE. I don't see recent ARPA/TCP cumulative patch in the crashinfo list (PHNE_41436) but it may not be reported.
The double panic may indeed point to a hardware error. Connect to the (real) console and get a list of the last 10-20 entries in the SL (hint: SL, then E for error logs, and F for forward progress). That may provide a better clue as to what is going on.
10-16-2012 09:11 AM
the both system logs look clean.
i'm arranging for the latest USB driver to be installed and we'll see if that brings us any joy.
about setboot...i've notes that say do "setboot -h <primary boot disk>" but is this is for pa-risc boxes only? does it work for itaniums as well?
about the patches....i spent an entire minute (ha!) looking at the list of patches in the latest bundle and it sure looks to me that the latest arpa patch (for example and since we're using it as a guinea pig) is NOT included. i suspect i missed something...but....
10-16-2012 04:49 PM
setboot -h <legacy hardware path, lunpath hardware path or a persistent dsf>
No difference between PA and IA as far as setboot is concerned. It is just setting stable storage on the system board.
As far as PHNE_41436, oops -- that should have been PHNE_41617, 41714 or 42017 for 11.31 (41436 is for 11.23). The ARPA cumulative patch is full of fixes for panics and other networking issues. But you have the second most recent so you should be OK. There are warnings for 41714 as well as its successor. For 41714 (which you have), it is only a problem when you have HP-UX Transport SRP installed. Check with:
swlist -l bundle HPUXTransportSRP
If nothing is found, then 41714 is OK.
Your best bet is to look at the SL logs.
10-24-2012 07:48 AM
updating the usb driver has helped....somewhat. at least now the box doesn't crash while shutting down.
it still won't reboot on it's own without a powercycle. (don't ask for console log...it didn't get captured this time around...)
has anyone heard of /sbin/reboot actually failing? i was wondering if we did:
1) shutdown (with no parms) and then
...would that be safe and helpful?
10-24-2012 01:07 PM
it reboots fine -- provided you powercycle the box.
if you do a typical/normal 'shutdown -r', it won't reboot on it's own.
you can see in previous console logs where it is (for example) closing lvm volumes, so i don't believe this is a problem with some rc script.
10-24-2012 01:17 PM
>if you do a typical/normal 'shutdown -r', it won't reboot on it's own.
Where does it stop? Anything in the shutdown log?
>you can see in previous console logs
I don't see those?
Have you tried disconnecting that USB disk?
10-24-2012 01:23 PM
the shutdownlog isn't helpful...just the normal one-liner.
anyhow, in this last go-round (where the usb driver was updated) we didn't get the shutdown dialog because the connection was lost.
the previous console/shutdown logs are a horror to look at and probably not helpful at this point.
afaik, the usb dvd drive is still connected. but between updating the driver and modifying setboot, i don't think it's an issue any longer. if you think otherwise, let me know.
i'm hoping for (yet another) reboot over the weekend....
10-24-2012 03:05 PM
>the shutdownlog isn't helpful... just the normal one-liner.
That means it got from shutdown(1m) to reboot(1m), so your suggestion probably wouldn't work?
>if you think otherwise, let me know.
Only if you have nothing better to do. ;-)
10-26-2012 07:39 AM
Look into /etc/rc.log.OLD: if something went wrong in the shutdown, this file is where the error messages would be recorded.
(Originally it would have been /etc/rc.log, but renamed to /etc/rc.log.OLD in the next system startup.)
11-12-2012 01:41 PM
ok...with the dvd drive pulled, 'shutdown -r' is still resulting in the box panicing upon shutdown completion.
the console listing is attached. this still feels like something hardware related...
11-12-2012 02:22 PM - edited 11-16-2012 12:00 PM
>the console listing is attached.
These errors indicate you have a garbage file left in /etc/rc.config.d/:
/sbin/rc: 1500: not found.
/sbin/rc1.d/K410Rpcd: 1500: not found.
I don't think it is related to your crash (unless you incorrectly fiddled with a config file) but you should fix it.
>Closing open logical volumes... Done
>panic : Failed to halt all processors
>Closing open logical volumes... Done
>Bad News: pr == 0x144000406c61001d
It seems that this is where it goes wrong. During the shutdown.
11-12-2012 02:39 PM
yeah...i noticed the rc.config.d messages. i, too, don't think they're related.
yeah #2 -- that's where it all goes wrong. perhaps some kind hp-insider could plug some of these messages/number into their call tracker and say <this> is the problem? :-)
11-12-2012 07:22 PM - edited 11-12-2012 07:23 PM
>I noticed the rc.config.d messages.
But have you fixed it yet? :-)
>perhaps some kind hp-insider could plug some of these messages/number into their call tracker ...
That's what the HPSC is for.
11-16-2012 10:51 AM
also, please bring the box down into single user mode first, then note any errors, and reboot from there. I will bet you a donut it will come up clean.
11-16-2012 10:54 AM
there's nothing (interesting) in the shutdown log. i think we (collectively) have decided that the machine is actually shuting down ok.
it's when the machine goes to restart (we're at the hardware level now) that it has problems.
if you are correct, how are you going to collect your donut? :-)
11-16-2012 11:13 AM - edited 11-16-2012 11:13 AM
I'd suggest going down to single in steps and watch the console to see anything irregular.
I'm pretty sure something is hanging when it goes down, I'd almost bet its not umounting all the disks or other hardware device for some reason and actually hanging at the very bottom of the process, before it attempts to come up on reboot.
do you have a large amount of san attached disks? or nfs mounts on this server?
11-16-2012 12:05 PM
>I think we (collectively) have decided that the machine is actually shutting down ok.
I don't think so. It is panicking when you are trying to shut it down: ux06.console.listing.txt
panic : Failed to halt all processors
Seems like a hardware problem.
>I don't see that you have reviewed the shutdown log.
Did you mean the rc.log? There were problems with it. Possibly unrelated?
11-16-2012 12:49 PM
yea rc.logs, iirc the shutdown log only shows when the box went down.
you might also inspect your /dev/ (devices) directory for any unusal entries. Make sure you can account for every device, hopefully not too many there!
grasping for straws here, but if possible, recreate your /dev/random and /dev/zero devices
Review swlist for any unnecessary products you can remove.
Random 3rd party software not managed by your depot...
12-18-2012 12:29 PM
just to *wrap* this issue up (ho ho ho)...
turned out the MP board was faulty. it took a fair amount of digging on the CE's part to diagnose the issue...but all is well now.
12-18-2012 11:22 PM
(Unified Core-I/O) board has not only the MP/iLO but also BMC and other components, a failure could affect more than the MP only.
From your initial post I noticed you still use the OLD USB driver (for the internal DVD and vMedia) - you should update this to get USB 2.0 support.
Hope this helps!
There are only 10 types of people in the world -
those who understand binary, and those who don't.
No support by private messages. Please ask the forum!
If you feel this was helpful please click the KUDOS! thumb below!