08-09-2011 11:42 AM
I am bringing up a DL380 G7 server with RHEL 6.1. I have an external SAS LTO-5 (HP 3000) drive connected through a P212 controller. I use NetBackup 7.1 to manage tape backups and it performs as expected. However, whenever the drive is accessed, I get messages logged in my /var/log/messages file like this:
Aug 2 07:31:26 advlsrv kernel: hpsa 0000:0b:00.0: cp ffff880037870000 has check condition: unknown type: Sense: 0x5, ASC: 0x24, ASCQ: 0x0, Returning result: 0x2, cmd=[1d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00]
The first digits in the "cmd=" string vary a bit, but otherwise the message is always the same. I can cause these with a simple LTT diagnostic query and when the NetBackup daemons are running, I get about four a minute.
I logged a case with Symantec (NetBackup) and they told me it was a SCSI sense error and a hardware problem. I logged a case with Red Hat and they told me the same thing - talk to your hardware vendor.
I am runing the latest firmware in the P212 and the tape drive and I am up to date on my Red Hat installation.
I have Googled the heck out of this with sparse results. Any ideas about what is going on?
08-10-2011 12:58 AM
Sense 05h = Illegal SCSI command requested
ASC/ASCQ 24h/00h = Invalid Field in CDB (Command Descriptor Block)
What this means is: The linux OS sent a SCSI command to the tape device which it does not understand. It is basically stating: "I do not understand your request" but then in SCSI language.
Now, there are various software components in the Linux OS that can issue these commands but most likely these come from system management components that are trying to 'monitor' your tape drive. Error is picked up by BackupExec because it basically is the only component that 'should' be monitoring the tape drive.
The solution is to disable system management agents one by one and observe when the messages stop.
I hope this explains things.
The SCSI sense data can be looked up on: www.t10.org
under the heading:
08-10-2011 04:47 AM
Thanks for the SCSI interpretation - that helps. However, when I have NetBackup down, I get no errors at all so I kinda doubt that it is my hp health stuff causing it. With NetBackup down, I can cause a single occurrance with a simple "mt -f /dev/st0 status" command. That results in:
Aug 10 07:36:30 advlsrv kernel: hpsa 0000:0b:00.0: cp ffff880037868000 has check condition: unknown type: Sense: 0x2, ASC: 0x3a, ASCQ: 0x0, Returning result: 0x2, cmd=[00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00]
I am afraid that there is a problem with the hpsa driver or the firmware on the device but I can't find where anyone else is having this problem.
BTW, I am running 64 bit RHEL 6.1
08-10-2011 07:30 AM
Sense: 0x2, ASC: 0x3a, means that there is no tape in the drive and represents the 'status' of the tape drive. This is expected behavior and part of how the tape drive communicates with the Initiator.
The reason why you are not seeing the SCSI events while NetBackup is down is likely because Netbackup issues Test Unit Ready commands periodically to make sure it still has access to the tape drive.
In any case, I do agree that this is an HPSA behavior (HPSA being the Storage Array driver). The SCSI messages themselves are no real errors.
08-10-2011 08:13 AM
That all makes complete sense. When I did the status command, there was not a tape loaded. I am also sure that NetBackup tickles the drive regularly. That said, I want to eliminate the error messages from /var/log/messages. They obscure daily issues that are important to see. I am getting on the order of 20K lines per week - very annoying and I don't have this on my other RHEL 6 box (dl380 G5, Ultrium 920, SC44Ge HBA).
So, anyone know how to eliminate this logging?