02-26-2009 06:51 AM
One of our development teams is getting the following error message from the Pascal compiler.
%PAS-F-ERRDURNEW, error during NEW
-LIB-F-BADBLOADR, bad block address
%TRACE-F-TRACEBACK, symbolic stack dump follows
image module routine line rel PC abs PC
PAS$RTL 0 0000000000034430 000000007C132430
PAS$RTL 0 000000000003438C 000000007C13238C
(I've attached the output as well.)
The HELP/MESS indicates that this a problem with invalid virtual addresses being passed to LIB$GET_VM or LIB$FREE_VM. The development team have not been able to reproduce this problem in isolation, it only occurs when the full process is up and running.
Any thoughts as to what the problem could be? The point which is the target of the new is local to the procedure.
02-26-2009 07:20 AM
I've posted up articles on implementing and using fenceposts and debugging memory management. That material was for C, but this most certainly looks like there's something that's stomped on the heap here.
That "something" here could be code in the application that's overrunning a variable or a buffer, an unsynchronized and late-arriving I/O that lands in its buffer from a previously-active stack frame, etc.
Start instrumenting your memory management code, and start centralizing your memory management. Also apply current ECOs for OpenVMS and the LIBRTL.
02-26-2009 08:25 AM
consider to use SET PROC/DUMP before running the image, which produces this error. Then a process/image dump will be written, if the error happens. This will allow further diagnosis of the memory contents at the time of the error.
02-26-2009 02:09 PM
Encapsulate your allocations and deallocations in a layer of calls so you can easily add diagnostics.
Try adding calls to LIB$SHOW_VM, LIB$VERIFY_VM_ZONE in various places. If you don't know which done which zone you're in, write a routine which loops using LIB$FIND_VM_ZONE to verify all zones. Call it from strategic points in your code to try to home in on where the heap is being trodden on.
02-26-2009 11:58 PM
I'll as the development team to do the SET PROC/DUMP.
The errors come out of a call to NEW so we have no means of instrumenting that. I will ask the team to add some to calls to check out the VM zones.
I'll also do a check on the code and check for some of the more obvious coding errors (IOSBs used by a QIO being defined as procedure local etc.)
02-28-2009 02:49 AM
did you note that the traceback stack dump does not include any of the application routines ? This may indicate, that the stack is also severly corrupted.
If you think you know the location of the failing NEW statement, you could at least add an explicit LIB$VERIFY_VM_ZONE for all zones into that module prior to the NEW statement.
02-28-2009 08:58 AM
I'd replace these NEW calls with my own calls and my own memory management. Don't scatter NEW (or malloc or any other VM calls) throughout your code. Centralize your calls.
Your developers will hate that. But it'll find and fix this, and it'll be easier to find and fix similar bugs in the future.
Scattered memory management calls with one or more latent heap corruptions can be somewhere between difficult and extremely difficult to debug, and latent errors can be more widespread than they might appear. Some of the various members of this class of errors can be rather more subtle than a stackdump. And the faults tend to be fairly far upstream in the program execution path, or subtle, or both.
03-02-2009 12:07 PM
As for replacing Pascal's NEW with direct calls to LIB$GET_VM and family, I'd like to point out that while NEW itself allocates memory using LIB$GET_VM, the compiler generates additional code around calls to NEW to perform many Pascal language semantics. Things like applying initial-state, initializing schema types, etc. Simply replacing calls to NEW with LIB$GET_VM may not work for certain types.
03-02-2009 12:45 PM
Some of the options are posted at the URL
that was included earlier.
Or building your own, on top of the low-level VM zone routines and zone verify routines in LIBRTL.
03-03-2009 12:26 AM
Thanks for all you responses. In the short term I asked the development team to move the definition of the pointer out of the procedure and into the globals. This has fixed the problem for now (or at least moved it elsewhere).
What I have now is a 15 year old piece of code which has been poorly maintained, had numerous changes applied by engineers who were not necessarily au fait with OpenVMS and generally (in the opinion of the lead developer) needs a major overhaul. I've also just been informed that the code itself is known to crash for no apparent reason (indicative of memory corruption).
I'll be discussing the options with the developers but I think I may end up having to perform a sanity check on the code, we have had several occasions where IOSBs for a QIO (not QIOW) have been declared locally to the initiating procedure, the completion of the QIO has then caused odd crashes.
cheers and thanks again