10-08-2013 02:58 PM - edited 10-08-2013 06:53 PM
I have a DL 370 G4 with a smart array controller (641). The controller runs 6 300 gb ultra wide scsi 320 drives in RAID 5. Last week, one of the drives died. I shut down the server, changed out the drive, and restarted the server. Immediately, I started having problems. The array rebuild was taking days (got to 50% in 48 hours), I got lots of timeouts from the array in event viewer. Eventually, the array went offline. I went in & checked, and a drive next to the one I had replaced had failed. So I pulled & reseated both drives, and restarted the server. The array then came up normally, I got no more warnings or errors in event viewer, no timeouts, nothing. The amber light on the second drive is still blinking, but both the ADU and ACU say the array is ok. . ACU also says the second drive is ok when I check the array, and go to check physical drives, and look at the properties on that second drive. No event viewer messages on the array status (normal, degraded, rebuilding, etc). So I dont' know if the array rebuilt, is rebuilding or is ignoring the first drive I replaced and will fail completely if I pull the second drive and replace it.
My concern is: Is the first drive I replaced going to get rebuilt anyway? That is, eventually, I want to replace the second failed drive, but I need to know if I do replace it, that the first replaced drive is going to be ok to use, and the controller will rebuild the array again on the second drive. Is there a way to find that out with the ADU or ACU utilities?
10-09-2013 03:47 AM
If you got a failure LED flashing amber. Then the drive is degraded, there's a minor problem with the drive.
The only tool, that can tell you what is wrong is the SMH (System Management Homepage).
Here you can see the problem with the drive and read the drive statistics.
Beware if a source drive got hard read errors, the array can't be rebuild.
You should replaced the degraded drive.
You can use the ACU / ADU or the SMH to confirm if the array i rebulid. If the array is not degraded, the the rebuild has completed.
But do check.
A degraded drive will not put the array in degraded mode, since the drive is still oprational.
The SMH and the IML is the best tools for troubleshooting HW problems in the first place. If that doesn't tell you whats wrong, then run the insight diagnostics.
By the way, why did you shutdown the server, to replace a hotswap drive?
10-09-2013 11:41 AM - edited 10-09-2013 04:39 PM
Let me try again:
I had a bad drive (drive 3), in a RAID 5 array. Replaced it. During the rebuild, somethings going wrong, I have errors in event viewer while the array was being rebuilt. array timeout errors.
I cannot tell if, after I replaced the drive, the array was completely rebuilt. The rebuild status in the ACU didn't appear to get much farther than 50-60% as far as I can tell during this period. Never saw array status normal event logged in event viewer.
Eventually, whole array went offline. System said second drive (drive 4) was bad. so now I think I have two bad drives: this new bad one (drive 4), and the first one (drive 3) that may have never finished rebuilding.
I shut the system off, reseated both the drive I originally replaced (drive 3) , and the second drive(drive 4). Restarted system. Array conroller says "replacement drives found drives 3,4", but the system and array came up, amber light now on the second drive (drive 4), but appears to be operational.
First drive (drive 3) now looks normal. No blinking light on the disk icon saying it was rebuilding.
No errors in event viewer anymore.
Ran Insight Diagnostics, ACU and ADU. No information at all on the rebuild status of the first drive (drive 3), of if the Array is ok, execpt for the imminent fail on drive 4
It does say drive 4 is imminient fail, should be replaced, and I want to replace it.
But, I'm not 100% certain of two things:
1.I'm not sure if the rebuild on the first drive I replaced (drive 3) completed. If it didn't, would the controller still use the degraded drive (drive 4) to finish the rebuild after I restarted the system?
2. if the insight diagnostics, ACU & ADU say the array is ok (but drive 4 is imminient fail), can I trust them? Or is there another way to ensure the rebuild of the array is complete after replacing drive 3, but before I swap out the second drive (drive 4)? Short of actually swapping the drive & seeing what happens.
10-10-2013 01:11 AM
Do check the rebuild status of the Array and logical drive(s).
Use both ACU and SMH. I prefer to use the SMH.
Legend in ACU and SMH:
Yellow = Degraded.
Red = Failed.
If the Array / Logical(s) is rebuilding, you will have yellow warnings, and you will se rebuild xx%
If it has completed, you wil see Green, and status OK.
A degraded drive is still in use.
In the ACU you will every thing is OK.
In the SMH, you will have yellow warning for the degraded drive. But the Array and Logical drive(s) will be green, since the disk is still in use. This is why I prefer to use the SMH.
As allways, ensure you got a good backup, and a disaster recovery plan.
10-15-2013 03:31 AM
If you're up and running in your OS then the logical drive must have rebuilt ok onto the replaced disk 3. (2 x disks failed in a RAID 5 would mean you wouldnt have an operating system or any data anymore!)
As long as the logical drive shows OK or green in ACU, then I'd go ahead and replace disk 4 as it is showing Imminent Failure with the flashing amber LED.
There is no need to shut the server down when replacing these disks as they are hot swappable.
As always, upto date firmware on your array controller and a good backup before you do anything else is a must...
Please click the white Kudos star to the left if this post is helpful :)