09-19-2013 03:41 PM - last edited on 09-25-2013 07:33 PM by maikoro
I am working on a program that creates a shared memory segment of 16000 bytes. The program works fine until it tries to write into shared memory at a location past 15k (past 15360), when it crashes with
Execution error : file '<xxx>'
error code: 114, pc=0, call=1, seg=0
114 Attempt to access item beyond bounds of memory (Signal 11)
115 Unexpected signal (Signal 4)
If I create shared memory with 16385 bytes (16k+1) it's fine. 16k exactly does not work. This is HP UX 15.5 64 bit. The program works under 15.0 32 bit. I can increase the amount of shared memory, but I don't know if that solves it or is just a band-aid that'll bite me later.
I'm wondering if shared memory is actually allocated in 1k increments, and it's having a problem with the amount between 15k and 16000, or something along those lines. Or more generally, is shared memory handled differently between 15.0 and 15.5. I have not found any documentation supporting this. Any thoughts?
P.S> This thread has been moevd from HP-UX > General to HP-UX > languages. - Hp forum Moderator
09-19-2013 04:06 PM
Brain fart. It's HP UX 11.31, and was HP UX 11.23. We compile 64 bit apps on the new server, 32 bit on the old. The problem happens on the new server. 15.0 and 15.5 are something unrelated that I happened to have up on my screen when I was typing. It's been a long day.
09-19-2013 08:40 PM - edited 09-24-2013 09:57 PM
>I'm wondering if shared memory is actually allocated in 1k increments,
I would assume it is allocated in terms of pages, at least 4 KB, so should be rounded up by the OS.
>Brain fart. It's HP-UX 11.31 and was HP-UX 11.23.
You can use "Post Options > Edit Reply" and nobody will ever know. ;-)
09-20-2013 04:26 AM
The system aligns the size to next base page-size boundary. The base-page size is usually 4096 unless you have changed it.
It is unlikely that you can get a signal if the size was 15000 bytes. The system would have aligned it to 16384 bytes (assuming a base-page of 4096). One of the signals you've got is 4 which is SIGILL -- seems to suggest that something else is happening.
Can you please share your code, if possible ?
09-20-2013 11:34 AM
I can't post the code and stay employeed, but maybe I can work up a smaller sample that does the same thing.
This error being a red herring is one of my main concerns. I can increase the size of the shared memory, but if that's covering up something else, it's not fixed.
09-20-2013 11:45 AM - edited 09-24-2013 09:57 PM
>maybe I can work up a smaller sample that does the same thing.
How are you accessing your shared memory from your MF COBOL app?
09-23-2013 04:10 PM
I pulled all the shared memory code out into a separate program, and it, of course, runs fine. So the shared memory segment size is probably covering up the real problem. So I'm on to that.
As far as how the program is accessing shared memory, it's just writing into it with a memcpy. (The cobol routine is calling a C function.)
Thanks for looking!
09-23-2013 08:52 PM - edited 09-24-2013 09:57 PM
>it's just writing into it with a memcpy.
What's the length of the copy?
I assume you know that the source and target can't overlap in any manner?
09-24-2013 08:19 AM
It's writing 48 bytes at a time, and theoretically has a few hundred bytes left. I've played with that size, abd the shared memory size in relation to the record size, and gotten no useful info.
Overlap: Yeah, it's nothing that easy. I've dumped addresses all over the place and nothing is close. Plus the same code works fine when recompiled on a 32 bit system.
09-24-2013 02:09 PM
Found it. It was a very obscure little pointer math problem. Seems that if you have a pointer to a structure and add an integer to it, you get garbage. Then when you write to that address you trash memory. I was trashing a function address that was out past my shared memory range, so the fact that it was shared memory was not relevant.
cptr = (void *)2305843009203077152; // cptr is defined as a pointer to a struct of 2 ints, a double, and 2 ptrs
anint = 48;
cptr2 = (void *)(cptr + anint); // cptr2 is a pointer to the same struct
anint2 = 48
cptr = 2305843009203077152
cptr2 = 2305843009203078688
Casting both the address and the int as a long fixed it.
Thanks for looking!
09-24-2013 03:15 PM
On further reflection (and before anyone else says it :-) it's not garbage, it's adding 48 * the size of the structure, which is expected. I just wasn't looking at it that way. Casting the pointer as a long is what made it work.
09-24-2013 09:55 PM
>It was a very obscure little pointer arithmetic problem. Seems that if you have a pointer to a structure and add an integer to it, you get garbage.
No, this is well defined by the C and C++ Standards. The result of ptoT + N is &ptoT[N] and is the same as
(T*)((char*)ptoT + N * sizeof(T)).
>Casting both the address and the int as a long fixed it.
The proper fix is to cast the pointer to a char*, then add, then cast back to the right type.