HP L1000 running HPUX 11.0 is crashing often - (98 Views)
Reply
Occasional Contributor
Lynn T. Ladner
Posts: 2
Registered: ‎07-06-2003
Message 1 of 5 (98 Views)

HP L1000 running HPUX 11.0 is crashing often -

Each week, we are getting a show-stopping crash... i have to cycle the power to get it back together. This is what i am getting in the syslog. I am new, and need a clue:

Jul 4 02:45:12 vision vmunix: SCSI: Abort abandoned -- lbolt: 31165162, dev: 1f022000, io_id: 201acf4, status: 200
Jul 4 02:45:13 vision vmunix:
Jul 4 02:45:13 vision vmunix: SCSI: Read error -- dev: b 31 0x022000, errno: 126, resid: 2048,
Jul 4 02:45:13 vision vmunix: blkno: 8, sectno: 16, offset: 8192, bcount: 2048.
Jul 4 02:45:13 vision vmunix: LVM: vg[1]: pvnum=0 (dev_t=0x1f022000) is POWERFAILED
Jul 4 06:17:41 vision /usr/lbin/ups_mond[1099]: /usr/lbin/ups_mond: UPS /dev/tty0p1 AC POWER FAILURE - running on UPS batter
y
Jul 4 06:17:41 vision /usr/lbin/ups_mond[1099]: /usr/lbin/ups_mond: AC Power to all recognized, system critical UPS's OK! Sy
stem will not shutdown.
Jul 4 06:17:44 vision /usr/lbin/ups_mond[1099]: /usr/lbin/ups_mond: UPS /dev/tty0p1 OK: AC Power back on
Jul 4 06:17:44 vision /usr/lbin/ups_mond[1099]: /usr/lbin/ups_mond: AC Power to all recognized, system critical UPS's OK! Sy
stem will not shutdown.

Thanks,
Lynn
Please use plain text.
Trusted Contributor
Krishna Prasad
Posts: 525
Registered: ‎09-24-1997
Message 2 of 5 (98 Views)

Re: HP L1000 running HPUX 11.0 is crashing often -

I agree it looks like a bad drive.

If you type in vgdisplay -v it may help you detect which drives are causing the problem.

Positive Results requires Positive Thinking
Please use plain text.
Honored Contributor
Stuart Abramson_2
Posts: 565
Registered: ‎05-06-2001
Message 3 of 5 (98 Views)

Re: HP L1000 running HPUX 11.0 is crashing often -

Here is how you decode "lbolt" messages. It will tell you the exact device. (hp-ux could tell you directly, but that would be too easy):

How to decode an lbolt error
=====================================================================
gbo390-d:~abramss/doc/11/lbolt.decode SDA 11/11/98


1. Get the "dev:" entry from the lbolt:

# dmesg | grep lbolt | grep dev:

SCSI: Abort -- lbolt: 18346341, dev: e7015000, io_id: 122e9a3
SCSI: Request Timeout -- lbolt: 18351441, dev: e7015000
SCSI: Abort -- lbolt: 18351441, dev: e7015000, io_id: 122e9be
SCSI: Request Timeout -- lbolt: 18356641, dev: e7015000
SCSI: Abort -- lbolt: 18356641, dev: e7015000, io_id: 122e9cf
SCSI: Request Timeout -- lbolt: 18362141, dev: e7015000
SCSI: Abort -- lbolt: 18362141, dev: e7015000, io_id: 122e9e0
SCSI: Request Timeout -- lbolt: 74105435, dev: 1f000000
SCSI: Abort Tag -- lbolt: 74105435, dev: 1f000000, io_id: 4ead34

Here we have two:

1f
e7

2. This is the major number of the device in question. Convert the first
two digits of the device from hex to decimal:

# printf "%#d\n" 0x1f
31

3. find out what driver this major number is. It tells us the type of
device:

# lsdev 31

Character Block Driver Class
188 31 sdisk disk

So, this is probably a disk !


4. Find the device file entry from the remainder of the lbolt error:

SCSI: Abort Tag -- lbolt: 74105435, dev: 1f000000, io_id: 4ead34

This is the minor number for the device that is failing.

a. Block device:

# ll -R /dev/ | grep 31 | grep 0x000000

brw-r----- 1 bin sys 31 0x000000 Jul 15 16:25 c0t0d0

Or:

b. Character Device:

# ll -R /dev/ | grep 188 | grep 0x000000
crw-r----- 1 bin sys 188 0x000000 Oct 11 07:15 c0t0d0

5. Find the Hardware Address:

# lssf /dev/dsk/c0t0d0
sdisk card instance 0 SCSI target 0 SCSI LUN 0 section 0
at address 0/0/0.0.0 /dev/dsk/c0t0d0


6. Find the type of device:

# diskinfo /dev/rdsk/c0t0d0# diskinfo /dev/rdsk/c0t0d0
SCSI describe of /dev/rdsk/c0t0d0:
vendor: DGC
product id: C2300WDR1
type: direct access
size: 4102875 Kbytes
bytes per sector: 512


So, we have a Nike disk at hardware address 0/0/0.0.0, device file
/dev/dsk/c0t0d0


Please use plain text.
Honored Contributor
Stuart Abramson_2
Posts: 565
Registered: ‎05-06-2001
Message 4 of 5 (98 Views)

Re: HP L1000 running HPUX 11.0 is crashing often -

Here is what this means:

LVM: vg[1]: pvnum=0 (dev_t=0x1f022000

The "0-th" disk in the /etc/lvmtab for the VG with minor number "01":

# ll /dev/*/group
crw-r--r-- 1 root sys 64 0x4a0000 Mar 18 09:28 /dev/05inst98/group
crw-r--r-- 1 root sys 64 0x490000 Mar 5 10:24 /dev/05inst99/group
crw-r--r-- 1 root sys 64 0x010000 Aug 16 2001 /dev/05vg01/group

.. (The above is minor number "01".) ..

crw-r--r-- 1 root sys 64 0x020000 Aug 16 2001 /dev/05vg02/group
crw-r--r-- 1 root sys 64 0x030000 Aug 16 2001 /dev/05vg03/group
crw-r--r-- 1 root sys 64 0x050000 Aug 16 2001 /dev/05vg04/group

# strings /etc/lvmtab | grep dev | more
/dev/vg00
/dev/dsk/c1t6d0
/dev/dsk/c2t6d0
/dev/05vg01
/dev/dsk/c3t2d2 <== The "0-th" disk
/dev/dsk/c5t2d2
/dev/dsk/c7t2d2
Please use plain text.
Honored Contributor
Jeff Schussele
Posts: 6,795
Registered: ‎02-18-2002
Message 5 of 5 (98 Views)

Re: HP L1000 running HPUX 11.0 is crashing often -

Hi Lynn,

Appears to me that you have two separate problems here:

1) Disk c2t2d0 is causing SCSI errors - I'd replace it at your earliest opportunity.

2) You're either losing power or you have a bad/flaky power monitor board. If you can verify that in fact you're not losing power, then I'd log a HW call w/HP as there are known issues w/early L-class power monitor boards that cause all sorts of "false" errors. There is also a possibility that the UPS itself is the root cause of this. Either way this one is the problem that is most likely to be causing the reboots.

HTH,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Please use plain text.
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation