Aries coredumps/memory limits. (526 Views)
Reply
Frequent Advisor
UNIXTEK
Posts: 33
Registered: ‎01-29-1997
Message 1 of 19 (526 Views)

Aries coredumps/memory limits.

Hi Guys

The company for which I work have bought an 'enterprise' application, that, on IA64, runs as a parisc 1.1 binary. That ofcourse fires up aries to run the application. During the lifecycle of the application, it calls a native IA64 binary..... or tries to. It coredumps with a big bang. Running tusc I see that it coredumps with ENOMEM.

In the FAQ for aries i found

http://h21007.www2.hp.com/portal/site/dspp/menuitem.863c3e4cbcdc3f3515b49c108973a801/?ciid=a408713ba...

"nswer: If the emulated HP 9000 HP-UX application fails with error ENOMEM while allocating memory - make sure that the value of kernel tunable parameter pa_maxssiz_32bit is not set to a very high value.

64-bit emulated processes are not likely to experience this issue under ARIES."

Looking at pa_maxssiz_32bit I found that it was set to the lowest possible. So nothing to tweak there.

Besides stating the obvious that the 'enterprise application' sucks, can you help me doing SOMETHING to get this piece of software to run?

The machine in question has 14GB of memory. The coredump of the process is 6GB(!)

Any input is recieved with warmth and beer (if you happend to be in denmark ;-))
Please use plain text.
Acclaimed Contributor
Dennis Handly
Posts: 24,953
Registered: ‎03-06-2006
Message 2 of 19 (526 Views)

Re: Aries coredumps/memory limits.

>it calls a native IA64 binary. or tries to. It coredumps with a big bang. Running tusc I see that it coredumps with ENOMEM.

Which process aborts? Is your PA process a 64 bit process?
What are the lats bunch of lines in your tusc output?

Do you have a stack trace? What is getting ENOMEM?
Please use plain text.
Frequent Advisor
UNIXTEK
Posts: 33
Registered: ‎01-29-1997
Message 3 of 19 (526 Views)

Re: Aries coredumps/memory limits.

It is the PA-risc binary that core dumps.

-rw------- 1 root sys 6303696 Jun 7 04:00 /core.

I have cut'n pasted a bit, you can see the trace attached to this mail, from where it dumps. I have masked the application with * in the output
Please use plain text.
Frequent Advisor
UNIXTEK
Posts: 33
Registered: ‎01-29-1997
Message 4 of 19 (526 Views)

Re: Aries coredumps/memory limits.

Forgot

tsiv@gobi|pts/1:/home/tsiv$ sudo file /opt/*************/aplication
/*************/application: PA-RISC1.1 shared executable dynamically linked dynamically linked

So 32-bit.
Please use plain text.
HP Pro
Rajesh K Chaurasia
Posts: 76
Registered: ‎01-31-2008
Message 5 of 19 (526 Views)

Re: Aries coredumps/memory limits.

Hi,

There isn't enough information as yet to know the root cause and suggest a resolution/workaround.

What ARIES is installed on your system? If not already on PHSS_41422 (11.23) / PHSS_41423 (11.31) please update ARIES patch.

What is the exact error message you see on stdout/stderr? What is the value of kernel tunable parameters maxdsiz, maxssiz and base_pagesize (only on 11.31)?

Can you attach last few lines in tusc log including most recent brk() system calls?

Regards
-Rajesh
Please use plain text.
Acclaimed Contributor
Dennis Handly
Posts: 24,953
Registered: ‎03-06-2006
Message 6 of 19 (526 Views)

Re: Aries coredumps/memory limits.

It looks like it coredumps with signal 11, not ENOMEM.

It would help if you used: tusc -fp -ea

It seems like you have process or thread output interspersed. Add -u if you have threads.
Please use plain text.
HP Pro
Rajesh K Chaurasia
Posts: 76
Registered: ‎01-31-2008
Message 7 of 19 (526 Views)

Re: Aries coredumps/memory limits.

Dennis, you are right. The process died due to SIGSEGV for an address which does not appear to be mapped. The ENOMEM error is from mprotect() call.

The tusc log is only partial starting from SIGSEGV delivery to application. From this point onwards rest of the log is for core file writing. Still the log appears to be in order.

So it looks like application is failing due to corrupt address. I would suggest getting backtrace from core file using GDB and listing of faulting code sequence.

Regards
-Rajesh
Please use plain text.
Frequent Advisor
UNIXTEK
Posts: 33
Registered: ‎01-29-1997
Message 8 of 19 (526 Views)

Re: Aries coredumps/memory limits.

Hi Guys

We are not on the patch level suggested by you

tsiv@gobi|pts/3:/home/tsiv$ sudo swlist -l patch | grep -i aries | grep -v Binaries | grep -vi libraries
# PHCO_36447 1.0 aries(5) man page patch
# PHCO_36448 1.0 Japanese aries(5) man page patch
# PHSS_38527 1.0 Aries cumulative patch

Attached a full trace. From 'idle' to 'idle' in the application.

The work that I put the application up for is 'get system info' and that involves a lot of fetching hardware infor and whats not. And in the end it core dumps.
Please use plain text.
Frequent Advisor
UNIXTEK
Posts: 33
Registered: ‎01-29-1997
Message 9 of 19 (526 Views)

Re: Aries coredumps/memory limits.

Installing the latest aries patch as suggested here does not help unfortiunatly. Still coredumps. In fact I get two core dumps in / now, one for the application before and one called just /core

If you need anything else to help just say so.
Please use plain text.
HP Pro
Rajesh K Chaurasia
Posts: 76
Registered: ‎01-31-2008
Message 10 of 19 (526 Views)

Re: Aries coredumps/memory limits.

If the failure persists with PHSS_41423 ARIES patch please submit the issue to HP response center. We may not be able to find root cause by debugging this over ITRC forum :-)

BTW, the second core file file named "core" is core file written by kernel for the whole process image which includes ARIES memory regions as well.

Regards
-Rajesh
Please use plain text.
Frequent Advisor
UNIXTEK
Posts: 33
Registered: ‎01-29-1997
Message 11 of 19 (525 Views)

Re: Aries coredumps/memory limits.

Hi Rajesh

I hear you. Will open a support case as well. It can just be unbelievable hard to get to the _right_ people through the proper channels in HP ;-)

Sometimes it is way easier just talking to them in here :-)
Please use plain text.
Acclaimed Contributor
Dennis Handly
Posts: 24,953
Registered: ‎03-06-2006
Message 12 of 19 (525 Views)

Re: Aries coredumps/memory limits.

>Attached a full trace.

Are you able to use /usr/ccs/bin/gdbpa to get a stack trace from your core file?
Please use plain text.
Frequent Advisor
UNIXTEK
Posts: 33
Registered: ‎01-29-1997
Message 13 of 19 (525 Views)

Re: Aries coredumps/memory limits.

tsiv@gobi|pts/2:/home/tsiv$ sudo /usr/ccs/bin/gdbpa /opt/bmc/BladeLogic/8.0/NSH/bin/rscd_full /core.rscd_full
HP gdb 5.9 for PA-RISC 1.1 or 2.0 (narrow), HP-UX 11.00
and target hppa1.1-hp-hpux11.00.
Copyright 1986 - 2001 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 5.9 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
warning: Load module /opt/bmc/BladeLogic/8.0/NSH/bin/rscd_full has been stripped

(no debugging symbols found)...
Core was generated by `rscd_full'.
Program terminated with signal 11, Segmentation fault.
SEGV_UNKNOWN - Unknown Error
(no debugging symbols found)...(no debugging symbols found)...
(no debugging symbols found)...(no debugging symbols found)...
#0 0xd1054264 in _ZNSsC1ERKSs+0xe4 ()
from /opt/bmc/BladeLogic/8.0/NSH/daal/Implementation/SystemInfo_libdaalpluginsysinfo.sl/libdaalpluginsysinfo.sl

warning: core file might be corrupted.
(gdb) bt
#0 0xd1054264 in _ZNSsC1ERKSs+0xe4 ()
from /opt/bmc/BladeLogic/8.0/NSH/daal/Implementation/SystemInfo_libdaalpluginsysinfo.sl/libdaalpluginsysinfo.sl
#1 0xd1053028 in _ZN22SysinfoVideoCardParser11parseTokensERSt6vectorISsSaISsEE
+0x500 ()
from /opt/bmc/BladeLogic/8.0/NSH/daal/Implementation/SystemInfo_libdaalpluginsysinfo.sl/libdaalpluginsysinfo.sl
#2 0xd0fcc480 in _ZN19SysinfoDeviceParser11parseTokensERSt6vectorISsSaISsEE
+0x398 ()
from /opt/bmc/BladeLogic/8.0/NSH/daal/Implementation/SystemInfo_libdaalpluginsysinfo.sl/libdaalpluginsysinfo.sl
#3 0xd100ec74 in _ZN13SysinfoParser9parseLineERSs+0x4ec ()
from /opt/bmc/BladeLogic/8.0/NSH/daal/Implementation/SystemInfo_libdaalpluginsysinfo.sl/libdaalpluginsysinfo.sl
#4 0xd100cc5c in _ZN13SysinfoParser4loadEPN12plugincommon5AssetEb+0x58c ()
from /opt/bmc/BladeLogic/8.0/NSH/daal/Implementation/SystemInfo_libdaalpluginsysinfo.sl/libdaalpluginsysinfo.sl
#5 0xd1064d24 in _ZN7sysinfo12SysinfoAsset16onGetDescendantsERN12plugincommon6StreamEi+0x164 ()
from /opt/bmc/BladeLogic/8.0/NSH/daal/Implementation/SystemInfo_libdaalpluginsysinfo.sl/libdaalpluginsysinfo.sl
#6 0xd10797ec in _ZN12plugincommon5Asset21bl_OpenGetDescendantsEiPPK21blAssetAttrDescriptorP13blAssetStream+0x10c ()
Please use plain text.
Frequent Advisor
UNIXTEK
Posts: 33
Registered: ‎01-29-1997
Message 14 of 19 (525 Views)

Re: Aries coredumps/memory limits.

Forgot something

from /opt/bmc/BladeLogic/8.0/NSH/daal/Implementation/SystemInfo_libdaalpluginsysinfo.sl/libdaalpluginsysinfo.sl
#7 0xd105d27c in blAsset_OpenGetDescendants+0xfc ()
from /opt/bmc/BladeLogic/8.0/NSH/daal/Implementation/SystemInfo_libdaalpluginsysinfo.sl/libdaalpluginsysinfo.sl
#8 0x27da14 in + 0x23c ()
#9 0x238760 in + 0xa78 ()
#10 0x1fc158 in + 0x130 ()
#11 0x18e740 in + 0x2e0 ()
#12 0x1b8024 in + 0x814 ()
#13 0xceabb1e4 in _ZN3Rpc9RpcServer13executeMethodEPNS_10RpcContextERKSsRNS_8RpcValueES6_+0x12c () from /opt/bmc/BladeLogic/8.0/NSH/lib/librpccommon.sl
#14 0xceac572c in _ZN3Rpc22RpcTCPServerConnection14executeRequestERNS_20RpcNormalizedMessageE+0x234 () from /opt/bmc/BladeLogic/8.0/NSH/lib/librpccommon.sl
#15 0xceac4d68 in _ZN3Rpc22RpcTCPServerConnection13writeResponseERNS_20RpcNormalizedMessageEi+0x3b0 () from /opt/bmc/BladeLogic/8.0/NSH/lib/librpccommon.sl
#16 0xceac479c in _ZN3Rpc22RpcTCPServerConnection11handleEventEji+0x5b4 ()
from /opt/bmc/BladeLogic/8.0/NSH/lib/librpccommon.sl
#17 0xceac2620 in _ZN3Rpc14RpcTCPDispatch4workEd+0x5e8 ()
from /opt/bmc/BladeLogic/8.0/NSH/lib/librpccommon.sl
#18 0xceb468e8 in agent_rpc_dispatch+0x38 ()
from /opt/bmc/BladeLogic/8.0/NSH/lib/libagentrpc.sl
#19 0x6e548 in + 0x1c0 ()
#20 0x7c344 in + 0x47c ()
#21 0x7bd30 in + 0x128 ()
#22 0x7bb28 in + 0x8f8 ()
Please use plain text.
Frequent Advisor
UNIXTEK
Posts: 33
Registered: ‎01-29-1997
Message 15 of 19 (525 Views)

Re: Aries coredumps/memory limits.

The funny thing is

* that if I run the IA64 binary by hand it works.
* I have other machines, where it works.
Please use plain text.
HP Pro
Rajesh K Chaurasia
Posts: 76
Registered: ‎01-31-2008
Message 16 of 19 (525 Views)

Re: Aries coredumps/memory limits.

@UNIXTEK,

There are no clues from the tusc log. However I noticed that process 17838 prints following on stdout, before receiving SIGSEGV

06/08/11 13:42:3 IndexedName not found. one created is hardware.networkcard:1
06/08/11 13:42:3 IndexedName not found. one created is hardware.networkcard:2
06/08/11 13:42:3
Here comes SIGSEGV

Does this sound something familiar from application point of view? Could this be related to application configuration?

You could try one more debugging step,
Create $HOME/.ariesrc file with following content

-notrans

and restart the application. It will behave very slow but if it does not fail with same symptoms, the error could be related to ARIES dynamic translator. If the application still fails, the issue could be related to application configuration/setup. What is the difference in application setup on servers where is works and the server where it fails?

Do not forget to remove the .ariesrc file after the experimental run.

Regards
-Rajesh
Please use plain text.
Acclaimed Contributor
Dennis Handly
Posts: 24,953
Registered: ‎03-06-2006
Message 17 of 19 (525 Views)

Re: Aries coredumps/memory limits.

#0 0xd1054264 in _ZNSsC1ERKSs+0xe4

This is aborting in:
string::string(string const&)

At frame 0, you might want to try:
info reg
disas $pc-4*16 $pc+4*8

>that if I run the IA64 binary by hand it works.

What IA64 binary? It is the PA executable that is aborting.
Please use plain text.
Frequent Advisor
UNIXTEK
Posts: 33
Registered: ‎01-29-1997
Message 18 of 19 (525 Views)

Re: Aries coredumps/memory limits.

Hi Guys

The cat is out of the box. The BladeLogic agent is a PARISC 1.1 agent even on itanium. If I instruct the agent to give me 'system info', on PArisc machines it will call a PA risc binary file to fetch the info. On IA64 it will call an IA64 binary. That is what I mean with "if I run the IA64 binary by hand".

If I put
/opt/bmc/BladeLogic/8.0/NSH/bin/rscd_full -notrans

in /.ariesrc and restarts the BL agent it runs dog slow, but still produces a nice coredump ;-)

The dissassmbled code:

(gdb) info reg
flags: 41
r1: 7 rp/r2: d1054247 r3: 7ab11d78 r4: 7af408ec r5: 400309a0 r6: 7b0452d0
r7: 4006c368 r8: 0 r9: 4 r10: 8 r11: 10 r12: 91
r13: 2 r14: 1000 r15: 400598d0 r16: 0 r17: 0 r18: 0
r19: 7ab11d78 r20: 7b045ac8 r21: 336638 r22: 40030938 arg3/r23: 400b8854 arg2/r24: 400309a0
arg1/r25: 7b045ac8 arg0/r26: 656763 dp/gp/r27: 4002e590 ret0/r28: 65676f ret1/ap/r29: 0 sp/r30: 7b045c00
mrp/r31: d10c988b sar/cr11: 20 pcoqh: d1054264 pcsqh: 0 pcoqt: d1054268 pcsqt: 0
eiem/cr15: 0 iir/cr19: f501094 isr/cr20: 0 ior/cr21: 0 ipsw/cr22: 4000f goto: 2
sr4: 4 sr0: 7 sr1: 4 sr2: 2 sr3: 3 sr5: 5
sr6: 6 sr7: 7 rctr/cr0: 0 pidr1/cr8: 0 pidr2/cr9: 0 ccr/cr10: 0
pidr3/cr12: 0 pidr4/cr13: 0 cr24: 0 cr25: 0 cr26: 0 mpsfu_high: 7ade0098
mpsfu_low: 0 mpsfu_ovflo: 0 pad: 0 fpsr: 8000000 fpe1: 0 fpe2: 0
fpe3: 0 fpe4: 0 fpe5: 0 fpe6: 0 fpe7: 0
(gdb) disas $pc-4*16 $pc+4*8
Dump of assembler code from 0xd1054224 to 0xd1054284:
0xd1054224 <_ZNSsC1ERKSs+0xa4>: stw %r20,-0x100(%sp)
0xd1054228 <_ZNSsC1ERKSs+0xa8>: b,l 0xd1054230 <_ZNSsC1ERKSs+0xb0>,%r1
0xd105422c <_ZNSsC1ERKSs+0xac>: depwi 0,31,2,%r1
0xd1054230 <_ZNSsC1ERKSs+0xb0>: ldo 0xec(%r1),%r1
0xd1054234 <_ZNSsC1ERKSs+0xb4>: stw %r1,-0xfc(%sp)
0xd1054238 <_ZNSsC1ERKSs+0xb8>: stw %sp,-0xf8(%sp)
0xd105423c <_ZNSsC1ERKSs+0xbc>: b,l 0xd104d108 <_Unwind_SjLj_Register>,%rp
0xd1054240 <_ZNSsC1ERKSs+0xc0>: ldo -0x120(%sp),%r26
0xd1054244 <_ZNSsC1ERKSs+0xc4>: ldw -0xf0(%sp),%r19
0xd1054248 <_ZNSsC1ERKSs+0xc8>: ldw -0x164(%sp),%r20
0xd105424c <_ZNSsC1ERKSs+0xcc>: stw %r20,-0xf4(%sp)
0xd1054250 <_ZNSsC1ERKSs+0xd0>: ldw -0x168(%sp),%r20
0xd1054254 <_ZNSsC1ERKSs+0xd4>: ldw 0(%r20),%ret0
0xd1054258 <_ZNSsC1ERKSs+0xd8>: ldo -0xc(%ret0),%r26
0xd105425c <_ZNSsC1ERKSs+0xdc>: ldo -0x138(%sp),%r20
0xd1054260 <_ZNSsC1ERKSs+0xe0>: copy %r20,%r25
0xd1054264 <_ZNSsC1ERKSs+0xe4>: ldw 8(%r26),%r20
0xd1054268 <_ZNSsC1ERKSs+0xe8>: cmpib,> 0,%r20,0xd1054280 <_ZNSsC1ERKSs+0x100>
0xd105426c <_ZNSsC1ERKSs+0xec>: ldi 1,%r20
0xd1054270 <_ZNSsC1ERKSs+0xf0>: ldw -4(%ret0),%r20
0xd1054274 <_ZNSsC1ERKSs+0xf4>: ldo 1(%r20),%r20
0xd1054278 <_ZNSsC1ERKSs+0xf8>: b 0xd1054290 <_ZNSsC1ERKSs+0x110>
0xd105427c <_ZNSsC1ERKSs+0xfc>: stw %r20,-4(%ret0)
0xd1054280 <_ZNSsC1ERKSs+0x100>: stw %r20,-0x11c(%sp)
End of assembler dump.
Please use plain text.
Acclaimed Contributor
Dennis Handly
Posts: 24,953
Registered: ‎03-06-2006
Message 19 of 19 (525 Views)

Re: Aries coredumps/memory limits.

0xd1054250 <_ZNS+0xd0>: ldw -0x168(%sp),%r20
0xd1054254 <_ZNS+0xd4>: ldw 0(%r20),%ret0
0xd1054258 <_ZNS+0xd8>: ldo -0xc(%ret0),%r26
0xd1054264 <_ZNS+0xe4>: ldw 8(%r26),%r20

This could be memory corruption. R26 has: 0x656763
This is NUL, "egc". Not like a data address?
It is r28: 65676f - 0xc: which is "ego".

And R20 has a stack address: r20: 7b045ac8

Please use plain text.
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation