03-09-2005 06:54 PM
a bit low on details here, can you provide more like :
- what OS is running
- what logs do you mean H/W logs (SEL, FPL)
or OS logs ?
- what H/W config has this RX4640 (I/O
- what is the interval at which they hang
(once a day, once a week)
- when they hang, is it remote access or
the console only i.e.
- what have you done to isolate the hang
(i.e. added/removed components or S/W, or
- when did it started to show the hang, did
they worked fine before ?
- what F/W level (SYSREV command from the
05-30-2007 01:19 AM
They are all running RHEL 4 update 4 in an Oracle RAC setup.
iLO MP: E.03.15
System Firmware: 03.17
The OS seems to be dead, but the machine still ping tho. Every process stops, no way to get in, the console is frozen too. I have to log into ilo to reboot it. Very annoying. The OS doesn't give any info, every logs stop at the same time. Everything seems to indicate it's doing fine, then it freezes.
I can probe tcp ports tho and they answer, that's the weird part. That why Nagios doesn't think it's completely down. No kernel panic. I don't know what to think. Is the network layer of the kernel still alive?
I have not taken any action yet, I'm planning on updating the system firmware to 4.21 and see.
05-12-2009 11:56 PM
Did you find a solution for your problem ?
We have exact the same problem with our 2node Oracle RAC. Both servers freezes at random times, console is stuck but ping still works and tcp ports are still open but don't respond back. The local hard disk leds are constantly lit. Only thing to do is a hard reset.
When one node freezes, the other node has got no problem. Nothing find in any log after the crash.
Server can freeze like 3 times a week and other times can run without problem for months. However I noticed when I cleanup the database, the freezes disappear for a while.
2x HP Proliant DL380 G4 w/ 4GB RAM
RHEL 4 upd 7
Oracle RAC 10.2.0.2 (2nodes)