Re: rx2660 halted by itself (8785 Views)
Reply
Occasional Advisor
sakthi_disk
Posts: 11
Registered: ‎10-09-2012
Message 1 of 18 (8,851 Views)

rx2660 halted by itself

rx2660 server halted by itself and in front panel system health led flashing red.

 

Then when i power on rx2660 machine the red indication changed to amber.

 

After a while the amber is gone. The server was running near 1 hrs some time 10minutes or 30 minutes then again its halted by itself.

 

The MP -> SL -> E event log reporting the below mesg.

 

225   BMC      2   POWER_UNIT_DISABLED
                   03 Feb 2013 10:54:40
224   BMC      2   CHASSIS_CONTROL_REQUEST
                   03 Feb 2013 10:54:39
223   BMC     *7   SHUTDOWN_OR_RESET_ON_SENSOR
                   03 Feb 2013 10:54:39
222   BMC     *5   VOLTAGE_DEGRADES_TO_NON_RECOVERABLE
                   03 Feb 2013 10:54:38
221   SFW  0   2   BOOT_SWITCH_INSECURE_MODE
                   03 Feb 2013 10:54:02
220   SFW  0   2   BOOT_START
                   03 Feb 2013 10:53:44
219   SFW      2   CPU_START_BOOT
                   03 Feb 2013 10:53:44
218   BMC      2   ACPI_ON
                   03 Feb 2013 10:53:40
217   BMC      2   POWER_UNIT_ENABLED
                   03 Feb 2013 10:53:39
216   BMC      2   SOFT_RESET
                   03 Feb 2013 10:53:39
215   BMC      2   CHASSIS_CONTROL_REQUEST
                   03 Feb 2013 10:53:34
214   BMC      2   POWER_BUTTON_PRESSED
                   03 Feb 2013 10:53:34
213   BMC      2   POWER_UNIT_DISABLED
                   03 Feb 2013 10:53:27
212   BMC      2   CHASSIS_CONTROL_REQUEST
                   03 Feb 2013 10:53:26
211   BMC     *7   SHUTDOWN_OR_RESET_ON_SENSOR
                   03 Feb 2013 10:53:26
210   BMC     *5   VOLTAGE_DEGRADES_TO_NON_RECOVERABLE
                   03 Feb 2013 10:53:25

  What could be the issue on the rx2660.

 

Honored Contributor
Patrick Wallek
Posts: 13,787
Registered: ‎06-21-2000
Message 2 of 18 (8,839 Views)

Re: rx2660 halted by itself

You have power problems.

222   BMC     *5   VOLTAGE_DEGRADES_TO_NON_RECOVERABLE
                   03 Feb 2013 10:54:38

 

It could be the power supply in the system.

 

 

Occasional Advisor
sakthi_disk
Posts: 11
Registered: ‎10-09-2012
Message 3 of 18 (8,834 Views)

Re: rx2660 halted by itself

Already i have checked the rack power junction and rx2660 power card with voltage there is no power problem seems to be present.

 

But in that RACK we have 8 servers, two rx2660 and hp dl series six servers. 

 

1 rx2660 and 6 DL series Servers are running good there is no voltage problem  seems to be present.

 

Anyway i had detach the server from rack and place it on my room when i start the server it is halted by it self.

 

The rx2660 have 2 smps module both are flashing green behind the server. I think the power supply is good.

 

Pls find the MP -> CM -> PS log for you reference.

 

MP:CM> ps


PS
For System Processor Status see the SS command.

System Power state : Off
System Power usage : 11  Watts used in Auxiliary Mode
Ambient temperature: 21  C
Temperature status : Normal


Power supplies                State
-----------------------------------------------------------
Power Supply 1                Normal
Power Supply 2                Normal

Fans                State               Fans                State
-------------------------------------------------------------------------------
Fan  1 (Mem)        Normal              Fan   7 (CPU)        Normal
Fan  2 (Mem)        Normal              Fan   8 (CPU)        Normal
Fan  3 (Mem)        Normal              Fan   9   (I/O)        Normal
Fan  4 (Mem)        Normal              Fan 10  (I/O)        Normal
Fan  5 (CPU)          Normal              Fan 11  (I/O)        Normal
Fan  6 (CPU)          Normal              Fan 12  (I/O)        Normal


 

Frequent Advisor
fhsjvl
Posts: 40
Registered: ‎05-18-2011
Message 4 of 18 (8,824 Views)

Re: rx2660 halted by itself

VOLTAGE_DEGRADES_TO_NON_RECOVERABLE pertains to the output voltage of the P/S, not to the rack power.

 

We have had the same problem twice. In both cases a P/S unit had to be replaced. Apparently even if one P/S unit continues to operate correctly, this problem can halt the system.

Occasional Advisor
sakthi_disk
Posts: 11
Registered: ‎10-09-2012
Message 5 of 18 (8,814 Views)

Re: rx2660 halted by itself

We are having two P/S same module an other machine, Which i have replaced to rx2660.

 

The machine was running 1 hr 45 min only then it is halted again.

 

 

Occasional Advisor
sakthi_disk
Posts: 11
Registered: ‎10-09-2012
Message 6 of 18 (8,811 Views)

Re: rx2660 halted by itself

The MP->SL->E ->A -3 Warning says MEM_NVM_XSUM_BAD

 

158   BMC     *7  0x20511262F0020B30 6000A87060120300 SHUTDOWN_OR_RESET_ON_SENSOR
                                                      06 Feb 2013 14:04:32
157   BMC     *5  0x20511262EF020B20 FFFF030760020300 VOLTAGE_DEGRADES_TO_NON_RECOVERABLE
                                                      06 Feb 2013 14:04:31
147   BMC     *3  0x20511260A9020A30 FFFF016F41080300 POWER_SUPPLY_FAIL_OR_DISCONNECT
                                                      06 Feb 2013 13:54:49
141   SFW  0  *3  0x408015F900E009B0 0000000000000000 MEM_NVM_XSUM_BAD
                                                      06 Feb 2013 13:53:45

 

Is this memory related issue.

HP Pro
S_Logan
Posts: 312
Registered: ‎10-11-2008
Message 7 of 18 (8,791 Views)

Re: rx2660 halted by itself

Hi Sakthi,

 

Could you please confrim the firmware details of this system.

 

 

Reards,

Surendar

I work for HP
Occasional Advisor
sakthi_disk
Posts: 11
Registered: ‎10-09-2012
Message 8 of 18 (8,785 Views)

Re: rx2660 halted by itself

The machine is running a day now, but i'm not confirm whether the problem is solved.

 

What we did is reset the BMC & iLO same thing was repeted(Haled every few second or 10min, 20min,45min, 1hrs 10min  and 1hr 45min).

 

Finally we tried Graceful Shutdown from MP-CM-PC-G then manually power on the server its started and i have monitored a day it is not repeting the  problem.

 

The machine is running a day under observation.

 

For your reference pls find the firmware versions

 

MP:CM> sysrev

SYSREV

Current firmware revisions

 MP FW     : F.02.23
 BMC FW    : 05.24
 EFI FW    : ROM A 07.14, ROM B 07.14
 System FW : ROM A 04.11, ROM B 04.11, Boot ROM A
 PDH FW    : 50.07
 UCIO FW   : 03.0b
 PRS FW    : 00.08 UpSeqRev: 02, DownSeqRev: 01

 

 

HP Pro
S_Logan
Posts: 312
Registered: ‎10-11-2008
Message 9 of 18 (8,772 Views)

Re: rx2660 halted by itself

Hi Sakthi,

 

Please note the altest firmware available please update the system to the latest.

 

Latest Firmware link:
 
 
VERSION:
 
  System FW   : 04.30
  BMC FW      : 05.26
  iLO-2 MP FW : F.02.26
  PDH FW      : 50.07
  UCIO FW     : 03.0b
 
 
Regards,
 
Surendar
I work for HP
Occasional Advisor
sakthi_disk
Posts: 11
Registered: ‎10-09-2012
Message 10 of 18 (8,719 Views)

Re: rx2660 halted by itself

HI Surendar,

 

The server was running under observation past 19 days without shutdown, 19th day again it is halted by itself.

When i start the server it is halted suddenly sometimes 10minutes, 15 minutes at max 25 minutes running.

 

I have updated the 2009 released firmware see below,

 

OLD FIRMWARE

Current firmware revisions

 MP FW     : F.02.23
 BMC FW    : 05.24
 EFI FW    : ROM A 07.14, ROM B 07.14
 System FW : ROM A 04.11, ROM B 04.11, Boot ROM A
 PDH FW    : 50.07
 UCIO FW   : 03.0b
 PRS FW    : 00.08 UpSeqRev: 02, DownSeqRev: 01

 

UPDATED FIRMWARE

Current firmware revisions

 MP FW     : F.02.25
 BMC FW    : 05.26
 EFI FW    : ROM A 07.14, ROM B 07.14
 System FW : ROM A 04.11, ROM B 04.15, Boot ROM B
 PDH FW    : 50.07
 UCIO FW   : 03.0b
 PRS FW    : 00.08 UpSeqRev: 02, DownSeqRev: 01

 

The System FW : ROM A is not updated i don't know the reason why, Do you have any idea aboout this issue  what could be the problem.

 

Otherwise can i try the latest firmware.

 

Regards,

Sakthi

 

 

Honored Contributor
Robert_Jewell
Posts: 1,239
Registered: ‎06-26-2001
Message 11 of 18 (8,707 Views)

Re: rx2660 halted by itself

The following error means that a voltage rail in the server has gone out of acceptable range.

 

0x20511262EF020B20 FFFF030760020300  VOLTAGE_DEGRADES_TO_NON_RECOVERABLE

This does not necessarily mean a power supply fault as there are a number of voltage regulators within the server.

 

Using the IPMI Event Viewer the error tells the following:

 

Keyword
VOLTAGE_DEGRADES_TO_NON_RECOVERABLE
 
Summary
Voltage degraded to non-recoverable level - Check all boards with this voltage.
 
Description
Voltage degraded to non-recoverable level from less severe - Check all boards with this voltage
 
Generator
Baseboard Management Controller

Sensor Type
Voltage

Sensor Number
96

Cause
The voltage in the server has gone outside the factory set range. A bad component, blown fuse, poorly seated module, loose cable, or debris could be responsible for this failure.
 
Action
When this condition was detected the system should have been immediately shutdown to avoid damage. Contact your HP support representative as soon as possible to have the unit checked. Check all boards, power supplies, and modules that either supply or use this voltage rail.


 

The voltage sensor ID is 96 (or 0x60).  It is the location of this sensor that is the cause of the problems.

 

If you have HP support they can confirm this, but I would think this has to do with the system board or processor voltages since there isnt much else in the rx2660!

 

-Bob

HP Pro
Holger G
Posts: 51
Registered: ‎11-16-2010
Message 12 of 18 (8,638 Views)

Re: rx2660 halted by itself

[ Edited ]

Hi Sakthi,

 

I agree to Bob, the power issue is on sensor 0x60.

This sensor is for a voltage on the Front Side Bus (processor bus), is generated on the system board and is consumed by both CPU modules.

So most suspect is the system board, but even one of the CPUs could cause this.

You may contact HP support to replace the system board, if this system still has warranty or you have an HP contract for it.

You may even remove one CPU (if you have two installed) to check if this solves the issue already.

 

Regarding your firmware question:

It is normal that only one firmware bank will be updated during firmware update. So the non-active firmware bank may show an older version. - That is ok.

 

 

Kind Regards

Holger

I am an HP employee.
Was this post useful? - You may click the KUDOS! star to say thank you.

Occasional Advisor
PrashanthShetty
Posts: 5
Registered: ‎03-14-2013
Message 13 of 18 (8,623 Views)

Re: rx2660 halted by itself

Hi,

 

This could very well be an CPU issue. 

 

Check the status of the CPU by MP:CM>ss command.

 

Best Regards,

Prashanth

 

HP Employee..

Occasional Visitor
unix_shiv
Posts: 2
Registered: ‎03-18-2013
Message 14 of 18 (8,590 Views)

Re: rx2660 halted by itself

Hi Holger,

 

Our rx2660 machine with single processor can i put it on second slot to boot the machine.

 

HP Pro
Holger G
Posts: 51
Registered: ‎11-16-2010
Message 15 of 18 (8,586 Views)

Re: rx2660 halted by itself

Hi Sakthi,

 

no, the system will not boot with only a CPU in socket 1. There has to be a CPU in socket 0.

So, if you have only one CPU installed in your rx2660, you cannot test the CPU by moving it.

 

 

Kind Regards

Holger

I am an HP employee.

Visitor
DougK
Posts: 2
Registered: ‎06-30-2011
Message 16 of 18 (8,250 Views)

Re: rx2660 halted by itself

[ Edited ]

I am running into the same error message on an HP Rx-2660.  

 

Here is the output of the MP System Event Logs:

 

#  Location|Alert| Encoded Field    |  Data Field    |   Keyword / Timestamp
-------------------------------------------------------------------------------
0     BMC      2  0x20526EDDCA020010 FFFF027000120300 SOFT_RESET
                                                      28 Oct 2013 21:57:30
1     BMC      2  0x20526EDDCA020020 FFFF006F04140300 POWER_BUTTON_PRESSED
                                                      28 Oct 2013 21:57:30
2     BMC      2  0x20526EDDCA020030 0401A37004120300 CHASSIS_CONTROL_REQUEST
                                                      28 Oct 2013 21:57:30
3     BMC      2  0x20526EDDD1020040 FFFF027000120300 SOFT_RESET
                                                      28 Oct 2013 21:57:37
4     BMC      2  0x20526EDDD1020050 FFFF010943090300 POWER_UNIT_ENABLED
                                                      28 Oct 2013 21:57:37
5     BMC     *5  0x20526EDDD2020060 FFFF03076F020300 VOLTAGE_DEGRADES_TO_NON_RECOVERABLE
                                                      28 Oct 2013 21:57:38
6     BMC     *7  0x20526EDDD2020070 6F00A8706F120300 SHUTDOWN_OR_RESET_ON_SENSOR
                                                      28 Oct 2013 21:57:38
7     BMC      2  0x20526EDDD2020080 6F00A3706F120300 CHASSIS_CONTROL_REQUEST
                                                      28 Oct 2013 21:57:38
8     BMC      2  0x20526EDDD3020090 FFFF000943090300 POWER_UNIT_DISABLED
                                                      28 Oct 2013 21:57:39
9     BMC      2  0x20000000040200A0 2450A17000120300 BMC_INITIALIZING
                                                                  00:00:04
10    BMC      2  0x20000000050200B0 1F81A37000120300 CHASSIS_CONTROL_REQUEST
                                                                  00:00:05
11    BMC      2  0x20000000060200C0 FFFF006FFA220300 ACPI_ON
                                                                  00:00:06
12    MP   0   2  0x5E800A7A00E000D0 0000000000000000 MP_SELFTEST_RESULT
                                                                  00:00:07
13    BMC      2  0x20000000080200F0 FFFF027000120300 SOFT_RESET
                                                                  00:00:08
14    BMC     *5  0x2000000009020100 FFFF03076F020300 VOLTAGE_DEGRADES_TO_NON_RECOVERABLE
                                                                  00:00:09
15    BMC     *7  0x2000000009020110 6F00A8706F120300 SHUTDOWN_OR_RESET_ON_SENSOR
                                                                  00:00:09
16    BMC      2  0x2000000009020120 6F00A3706F120300 CHASSIS_CONTROL_REQUEST
                                                                  00:00:09
17    BMC      2  0x20526FBAD6020130 FFFF0103FCC00300 TIME_SET
                                                      29 Oct 2013 13:40:38
18    BMC      2  0x20526FBAD6020140 FFFF000943090300 POWER_UNIT_DISABLED
                                                      29 Oct 2013 13:40:38
19    BMC      2  0x20526FBAFB020150 FFFF027000120300 SOFT_RESET
                                                      29 Oct 2013 13:41:15
20    BMC      2  0x20526FBAFB020160 FFFF006F04140300 POWER_BUTTON_PRESSED
                                                      29 Oct 2013 13:41:15
21    BMC      2  0x20526FBAFB020170 0401A37004120300 CHASSIS_CONTROL_REQUEST
                                                      29 Oct 2013 13:41:15
22    BMC      2  0x20526FBB03020180 FFFF027000120300 SOFT_RESET
                                                      29 Oct 2013 13:41:23
23    BMC      2  0x20526FBB03020190 FFFF010943090300 POWER_UNIT_ENABLED
                                                      29 Oct 2013 13:41:23
24    BMC     *5  0x20526FBB030201A0 FFFF03076F020300 VOLTAGE_DEGRADES_TO_NON_RECOVERABLE
                                                      29 Oct 2013 13:41:23
25    BMC     *7  0x20526FBB030201B0 6F00A8706F120300 SHUTDOWN_OR_RESET_ON_SENSOR
                                                      29 Oct 2013 13:41:23
26    BMC      2  0x20526FBB030201C0 6F00A3706F120300 CHASSIS_CONTROL_REQUEST
                                                      29 Oct 2013 13:41:23
27    BMC      2  0x20526FBB050201D0 FFFF000943090300 POWER_UNIT_DISABLED
                                                      29 Oct 2013 13:41:25

   -> This is the last entry in the selected log.

 

And current firmware:

 

SYSREV


Current firmware revisions



 MP FW     : F.02.23

 BMC FW    : 05.24

 EFI FW    : ROM A 07.14, ROM B 07.14

 System FW : ROM A 04.11, ROM B 04.11, Boot ROM A

 PDH FW    : 50.07

 UCIO FW   : 03.0b

 PRS FW    : 00.08 UpSeqRev: 02, DownSeqRev: 01

 How would I go about deciphering the voltage degrades/shutdown events to determine what sensor is causing the issues?  I know from reading through this thread as well as searching online that there are a number of components that could be generating this error.  I'd like to cut down on replacing components that don't need to be replaced if at all possible.

 

Thanks,

HP Pro
Holger G
Posts: 51
Registered: ‎11-16-2010
Message 17 of 18 (8,232 Views)

Re: rx2660 halted by itself

Hi,

your power issue is on sensor 0x6F, which is for a voltage used on the Front Panel and is generated on the system board.

You may even check the MP logs in text mode to get more information than in hex mode.

 

 

Kind Regards

Holger

Was this post useful? - You may click the KUDOS! star to say thank you.

Visitor
DougK
Posts: 2
Registered: ‎06-30-2011
Message 18 of 18 (8,221 Views)

Re: rx2660 halted by itself

Thanks!

 

I wasn't aware of being able to get more information when I connected over the serial port.  

 

So on top of helping me, you've taught me something new today!

The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.