ML350 G5 Disk Failure (622 Views)
Occasional Advisor
PinnacleCS
Posts: 7
Registered: ‎09-05-2007
Message 26 of 54 (211 Views)

Re: ML350 G5 Disk Failure

Were you having data corruption problems or blue screen problems? I am having the 1792 errors on reboot, no BSOD or data corruption yet but the server isn't in production. I appled the KB AFTER the 7.9 support pack. I sure hope this is resolved!
Occasional Visitor
francis obuon
Posts: 1
Registered: ‎06-28-2007
Message 27 of 54 (211 Views)

Re: ML350 G5 Disk Failure

I have a similar problem with exactly the same server, ML350 G5.It keeps reporting that both PSUs have failed or are unplugged and should be replaced.I've cleared error logs,re-installed HP SIM but to no avail. Any ideas?
Advisor
Scott M. Harrison
Posts: 24
Registered: ‎01-02-2003
Message 28 of 54 (211 Views)

Re: ML350 G5 Disk Failure


Does anyone have the definitive fix for this? I have a brand new ML350 G5 with the same issue - intermittent 1792 notification on boot. I am supposed to go into production with this box in the next week or so and after reading this am reticent to do so. I have the latest firmware and drivers (7.91 series) installed and am running SBS 2003 R2 SP2. I don't believe I have the aforementioned MS KB installed but will look into it.

Thanks.

Scott
Occasional Advisor
PinnacleCS
Posts: 7
Registered: ‎09-05-2007
Message 29 of 54 (211 Views)

Re: ML350 G5 Disk Failure

Have both PSU's replaced and install the hot fix in the KB article mentioned that will fix it. I also back-reved to support pack 7.8 just to be safe because 7.9 had a lot of issues. I have another ML350 G5 coming in so I'll let you know what I find there too.

Blake
Advisor
Scott M. Harrison
Posts: 24
Registered: ‎01-02-2003
Message 30 of 54 (222 Views)

Re: ML350 G5 Disk Failure


Thanks Pinnacle CS. I checked both supplies against the Customer Advisory and both are good. I installed hotfix 932755 and will beat the snot out of the system and reboot 50 times before I go into production. The few reboots I've done since the hotfix have been clear of 1792's, although I have not done a lot of I/O on the system. Please do let me know how testing goes with your new G5 as well. Thanks again.

Scott
Advisor
Scott M. Harrison
Posts: 24
Registered: ‎01-02-2003
Message 31 of 54 (222 Views)

Re: ML350 G5 Disk Failure

Well, I spent the weekend dinking around with this and here's what I've found. If I use drivers 6.6.2.32 or 6.8.0.32, I get the intermittent 1792's. If I downgrade to 6.6.0.32, I don't. I rebooted 26 times with that driver installed and all is clean. Using the other two, I have about a 50% chance of getting the error. I have everything patched and up-to-date with the latest Smartstart and 932755.

HP sent me a new cache board based on output from the ADU, which I knew would not work. It didn't. I'm going to call them back and see what they have to say, which I'm sure will be helpful! ;)

Now I am faced with either leaving the old driver on and dealing with hangs or stops on shutdown, or putting the latest drivers on and waiting for corruption. Great choice.

Scott
Occasional Visitor
Ab Kole
Posts: 2
Registered: ‎12-19-2007
Message 32 of 54 (222 Views)

Re: ML350 G5 Disk Failure

Hi All,

We have the same problem with 5 ML350G5 servers now. Alle have SBS 2003 R2 installed and freeze now and again. We have a case witch MS and one with HP. We foud out that even witch the newest SmartStart 7.91 we have bus erros on our hard drives. Even when there is no MS OS on the server !!. HP say now (after 2 weeks of testing) that this is a known bug in firmware 1.66 of the E200i raid controller and that this will be solved in the next firmware update....

We suggested upgrading the E200i to an P400 controller to help our customers but no can do. We have to test and test and test to sulte the HP problem. We spend over 2 weeks on hours on this problem and 4 servers are in our office waiting to be completed. Installation is posponed... Clients are not verry happy.

We see the problem with servers that have a lot of disk activity. I keep you informed if there is a solution

Advisor
fricci
Posts: 17
Registered: ‎06-25-2007
Message 33 of 54 (222 Views)

Re: ML350 G5 Disk Failure

Scott,
as you can read in my post dated september 9, 2007 you had almost the same experience I had, but it is quite funny to discover the driver that seems to be stable is 6.6.0.32 (I got this results using release 6.6.2.32).

Anyway, I can confirm that AFTER DISABLING ACCELERATOR (I suppose this means using a write-through alghoritm in "HP language"), ALL IS WORKING without problems from the end of August (but I didn't apply any new patch or driver, waiting for some "official" solution).
I am very interested in knowing if this workaround solves your issues.

After reading all this (dramatic) posts, this is my definitely thought: the E200i controller (hardware+firmware+driver) is BAD, so the best thing you can do is: DON'T BUY IT.

I am very interested in knowing if using a P400 controller is the ultimate solution as expected.

If anyone made some testing in replacing the E200i controller, please let us know.

Franco
Advisor
Scott M. Harrison
Posts: 24
Registered: ‎01-02-2003
Message 34 of 54 (222 Views)

Re: ML350 G5 Disk Failure

Franco,

I concur with your findings. Disabling the write cache does indeed stop the 1792's from occurring.

I had a follow-on issue and unfortunately cannot say what caused it, but I am suspicious of the old driver that I was using. I found the machine crashed one night, with no indication of the cause evident in any of the logs. It had corrupted data on the partitions.


I decided to reload everything from scratch and retrace my steps. What I found makes me somewhat suspicious of SBS 2003 vs. the HP controller, but perhaps it is the combination. I found that the 1792's happen after every reboot once the first phase of the SBS install is performed (domain controller, etc.). When the remainder is installed (R2, patches, etc.) the problem becomes intermittent again, which makes me wonder if SBS is not shutting down properly and you only see it with a controller that has a battery-backed cache. At this point, I gave up and purchased a P400 controller w/o bbc (for a lot of reasons) just to get going.

I agree that the e200i hardware-firmware-driver combination is weak. I would add that HP's support is also Very BAD. I sent an engineer the logs he requested approximately three weeks ago and have heard nothing back (and I paid for 7x24-4hr). I am going to call them and complain today.

Scott
Occasional Advisor
Antony Ryan
Posts: 5
Registered: ‎01-02-2008
Message 35 of 54 (222 Views)

Re: ML350 G5 Disk Failure

We have experienced the same problem with 2 new servers - ML350 G5 Quad Core with e200i. Both servers completely freeze when under heavy disk i/o.

HP have replaced the mainboard, and the battery backed up cache. One thing we did notice is that if we run a disk i/o stress test, it works on the RAID 1 config, but not on the RAID 5 config (system locks up after 10 seconds). We did this test with the HP tech standing next to us, so he could see the results himself. He is going to source another RAID controller so we are not using the e200i - will let you know once this has been done.
Advisor
fricci
Posts: 17
Registered: ‎06-25-2007
Message 36 of 54 (222 Views)

Re: ML350 G5 Disk Failure

Scott,
I get the same warning during post (1792) on a Windows Server 2003 R2 installation, so I don't think the problem comes from SBS itself, but all Windows installations. I don't know if the problem exists in Linux too, but anyway HP servers are certified to work with Windows O.S.
Yesterday I had a long talk with the reseller's technical support (HP Certified Partner) and and we decided to try to replace the internal E200i with a P400 controller, probably with BBC. I hope this will be an ultimate solution.
As I already declared in a previous post HP technical support is worse . They actively creates damage.

Maybe next week they will call you asking if you fixed your issue, like they did with me..... :-(

Franco
Advisor
Scott M. Harrison
Posts: 24
Registered: ‎01-02-2003
Message 37 of 54 (222 Views)

Re: ML350 G5 Disk Failure

Franco,

Please let me know what you find if indeed you get to test a P400 w/bbc. I'd be interested in the results.

As to SBS vs. generic Windows being an issue, I did a basic load of Windows 2003 server (no domain, DNS, etc.) as a sort of control for the test. No 1792's. My approach wasn't all that scientific but it makes makes me wonder about SBS' role in this.

Thanks.

Scott
Occasional Visitor
Henry Boehlert
Posts: 5
Registered: ‎01-03-2008
Message 38 of 54 (222 Views)

Re: ML350 G5 Disk Failure

We have 412645-B21 (ProLiant ML350 G5), 436013-L21 (E5345, Intel Quad-Core Xeon), 351580-B21 (E200 128MB BBWC) and 395473-B21 (500GB 7.2k HP SATA).

We encountered the lockups on heavy disk i/o, too, even after applying all updates from SmartStart, SupportPack and Firmware Maintenance CD.

Also, on one of our servers the RAID array would vanish after a single disk failure and had to be rebuild from backup.

HP support would first assume the WD SATA drives we're using were not supported by HP but then had to realize that that's actually what they're shipping.

After exchanging reports from various analysis tools, HP confirmed the bus errors and now blames an inconsistency between the E200 BBWC and SATA drives regarding Native Command Queueing.

Now we're scheduled to get the E200 replaced with something else (most probably an E400) and the SATA drives by SAS drives.

Interesting to learn that it's actually a firmware issue (i.e. easy to fix), looking at the cost this is incurring on us as well as on HP.
Occasional Advisor
Antony Ryan
Posts: 5
Registered: ‎01-02-2008
Message 39 of 54 (222 Views)

Re: ML350 G5 Disk Failure

We had the HP tech on-site again today to install a P400 controller - but... he couldn't get it to work!!

The server would start to boot, and then just fail (we had only installed the card, hadn't attached any drives to it as yet - as per HP supports instructions. We tried numerous things, all to no avail. HP are going to come back next week with another P400 and see if they can get this working.
Advisor
fricci
Posts: 17
Registered: ‎06-25-2007
Message 40 of 54 (222 Views)

Re: ML350 G5 Disk Failure

Scott,
I will keep you informed about any evolution, but I think it will take some time.... I hope to replace the controller before the end of January, but I am not sure about it.

Anyway, I always get the 1792 warning on a Windows Server 2003 *R2* SP1, a different release of Windows compared with SBS2003 Standard R2 which runs Windows Server 2003 (R1) SP1 + SP2 update, but in this case the warning *seems* harmless (I had no problem with disks).

This server is a Domain Controller (AD+DNS+DHCP+WINS) with two SAS disk (RAID 1), the SBS server uses four SAS disk (RAID 1+0).

Franco
Occasional Advisor
Antony Ryan
Posts: 5
Registered: ‎01-02-2008
Message 41 of 54 (222 Views)

Re: ML350 G5 Disk Failure

To all

Today we installed another P400 controller, ran our stress tests, and the server passed with flying colours! Yay...

Now all we need to do is convince HP that it is some fault with the E200i (be it drivers or firmware - I don't really care) on the ML350 G5, get them to supply the part (P400 controller) for free (and compensate my collegues and I for the many hours we have wasted!).

We also need to get another P400 for our other client that is experiencing the exact same issues on the exact same hardware.

Ant
Advisor
Scott M. Harrison
Posts: 24
Registered: ‎01-02-2003
Message 42 of 54 (222 Views)

Re: ML350 G5 Disk Failure

Ant,

Are you using the P400 w/bbc? And good luck trying to get HP to compensate you. ;)
Advisor
fricci
Posts: 17
Registered: ‎06-25-2007
Message 43 of 54 (222 Views)

Re: ML350 G5 Disk Failure

Unfortunately, the reseller's technical support changed its mind.....
They told me (like HP technical support did) this is a software issue caused by an **improper configuration** of server. They make this decision after our talk, they never had a look at the server!
Maybe working with HP products produces this intellectual damages? ;-)
So, considering that the customer doesn't want to pay more that 500 Euro for a new controller, I think this could bring to legal actions....

Ant,

please let us know any news about the P400 testing (with BBWC?).


Franco
Occasional Advisor
Antony Ryan
Posts: 5
Registered: ‎01-02-2008
Message 44 of 54 (222 Views)

Re: ML350 G5 Disk Failure

Franco/Scott

No BBC on the P400, just the standard 256 MB cache. One thing I was reading about the E200i - it will support RAID 5 with 128 MB cache - both our servers had this upgrade in place, but still failed.

I will let you all know how we go with HP in regards to getting these parts for free - and also what sort of response we get from HP about the inabality of the E200i to work properly.

Ant
Occasional Advisor
PinnacleCS
Posts: 7
Registered: ‎09-05-2007
Message 45 of 54 (222 Views)

Re: ML350 G5 Disk Failure

I originally had an ML350 G5 with the e200i controller and experienced the same issues. I sent it back and replaced it with an ML370 with a P400 controller. I had received the same error on the P400 w/BBWC installed, although not as often. I installed the 7.9 support pack and the MS Hot Fix for the STOR port driver and haven't seen it again. This server was a SBS 2003, SP2. I am 99% sure these issues started after applying SP2.

About a month ago we needed to purchase a replacement server for one of our applications so we knowingly purchased a ML350 w/E200 and BBWC. I did some testing on this server. It was a plain Jane Windows 2003 build. I intentionally did not put SP2 on it. I tested w/7.9 and after 20 reboots and gigs of data, I did not receive the error. I put SP2 on the server and after just two reboots, I got the error message. I was not able to duplicate it performing any particular action. I installed the STOR Port update from MS, another 20 reboots later and gigs of data and no error. The ML370 has been in production for 3 months and the ML350 for about a month. Knock on wood, we haven't had any issues with either. Fortunately for my customer, I caught this and questioned it before it went into production. Iâ ve built enough servers and been in this industry long enough to know that any Array Controller status message after the server is built is not a good thing regardless of what the stupid Indian in Tech Support says.

IMHO, I believe SP2 in the culprit and the issue occurs during shutdown. I have not seen nor heard of anyone having the same issue with a Linux or Novell based host. I don't think the SCSI subsystem in Windows is working with the driver correctly. I have not been able to replicate this issue once I installed the updated STOR port driver from MS. I wish I had time to build a Novell or Linux box and test with that.

BTW, I second everyoneâ s opinion on HP's support. The stupid Indian techs don't know crap, they don't give a crap and they are morons. Just another instance of why we need to stay the HELL out of those countries, those people don't know anything nor do they care and they have no business in the tech industry. I see it time and time again even with the people who come to this country to work. Why in the hell doesn't HP see that, oh wait, the only thing most execs see is green. Sorry......

Hope this helps.

Blake
Occasional Visitor
crisscross
Posts: 2
Registered: ‎01-11-2008
Message 46 of 54 (222 Views)

Re: ML350 G5 Disk Failure

Blake, you are 100% correct. HP is staffed from top to bottom by MORONS. the execs are morons because they forsake customer satisfaction for a quick buck. the techs are all morons because they just are. there is nothing worse than dealing with a stupid indian who acts like he knows what he's doing. i'm on hold with them about this issue now. i demanded to speak to a second tier engineer and they refused. i stated that i knew in advance that the first guy couldn't help me yet i was forced to waste 15 minutes with him. he insisted that the 1792 and 1794 errors would go away if I installed firmware 1.66 for the e200i. i checked and told him it was already installed so he was WRONG. moreover, i don't care if the error reporting goes away. i only care if the problem is resolved. frankly, i'm shocked that hp is even allowing this thread to stay published due to all the negative statements (backed by proof) about them. this community should move to another forum so we can stay in touch in the event that hp pulls their usual tactics of obscuring their own failures. these american companies operate like third world dictatorships.
Advisor
Scott M. Harrison
Posts: 24
Registered: ‎01-02-2003
Message 47 of 54 (222 Views)

Re: ML350 G5 Disk Failure

Amazing to me is how this is fixed for some when all of the patches and hotfixes are installed, and for others it isn't. I tried rebuilding TWICE, and the best that I was able to do in both cases was to make the problem intermittent.

Scott
Occasional Visitor
crisscross
Posts: 2
Registered: ‎01-11-2008
Message 48 of 54 (222 Views)

Re: ML350 G5 Disk Failure

scott, fortunately, I haven't experienced any data loss or crashes yet, but there is no way i'm going into production after reading this thread and seeing the error messages myself.

i'm running firmware 1.66 on the e200i with driver 6.8.0.32. i manage about 30 of these ML350 G5's at various clients. they are all identical and were all purchased within 6 months of each other. almost all report 1794 errors which are preceded by 1792 errors - saying the battery charge is low. fortunately, i'm running sas drives, so i hope to not experience the failures of the entire sata raid array that were reported above when a single drive fails. i want to get these tech issues resolved, but I want to do something to send a loud message to hp corporate so it doesn't happen. you know that many of their components are IBM, right? it's just how they integrate that's different. god knows IBM has been sucking lately too.
Occasional Advisor
Antony Ryan
Posts: 5
Registered: ‎01-02-2008
Message 49 of 54 (222 Views)

Re: ML350 G5 Disk Failure

HP are still faffing around with us. They provided us with a new firmware for the e200i that hasn't been published on the website yet. They assured us this would fix the problem - ofcourse it didn't!

Now they have asked for one my techs to go onsite again (so much time being wasted on this issue) so they can try playing around with the e200i settings, turn off write caching or something.

Will let you all know how it goes ofcourse - needless to say (but I will say it anyway) getting really p*ssed off with all this.
Advisor
fricci
Posts: 17
Registered: ‎06-25-2007
Message 50 of 54 (222 Views)

Re: ML350 G5 Disk Failure

After six months, I can now confirm that DISABLING ACCELERATOR is the ultimate workaround to solve all E200i+BBWC issues.
After disabling it I neither got one "1792-Drive Array Reports Valid Data Found in Array Accelerator" message, nor I got any DATA CORRUPTION problem, with any driver version (tested versions: 6.6.2.32, 6.8.0.32, 6.10.0.32).
So, if you get problems with E200i+BBWC controller, DISABLE ACCELERATOR !!!
I hope this could help you.

In the meanwhile I discovered another nasty issue on ML350G5, but I will post it in a specific thread........

Franco
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.