10-31-2012 05:51 AM
I'm wonder if there anything similar in HPBASIC as Fortans and Pascals "VOLATILE"?
I have /NOOPT when compiling.
I have not tried /SYNCH and I wonder if that would help?
Some times it seams that code ain't executed in the order it's written...
Solved! Go to Solution.
10-31-2012 06:11 AM
Given that you are requesting /NOOPT, you should not see anything optimized away that the equivalent of /VOLATILE would prevent.
One option that comes to mind though, would be to declare the variable as External. With the external reference, the compiler should not be able to optimize it in any way as it will not know much about it.
Why do you believe that there are out of order execution issues? Can you provide more details about the program environment?
10-31-2012 07:39 AM
I don't think you can get Basic to be tryuly thread safe, but if I recall correctly it is relatively safe anyway since it does not optimize variables into registers and such.
I would stick anything that can change behind it's back into a MAP or COMMON to be sure.
For more tricky stuff you may need to protect with $SETAST (yuck) or a wrapper function with a sync mechanism.
What problem are you trying to solve?
What are you observing that makes you conclude the order is not as expected/intended.
Itanium? OpenVMS version? Basic version?
10-31-2012 08:13 AM
I think I remember being told that BASIC is thread-safe but not AST-reentrant. Or, in other words, it depends on the threading model being used, which I don't see specified yet.
10-31-2012 11:24 AM
Openvms 8.3 Alpha and basic 1.7.
Sorry for my English in advance :)
I started to think a bit, yay :) And made my self some test programs, see test.zip.
The "run_counter1" just simulated the sys$enqw, but not generate what i expected when forcing them to diffrent CPU:s... so skip that for now.....
The run_counter2 simulated my problem.
Edit build.com were you place your source, se the MY_EXEC define...
then run run_odd.com and run_even.com in 2 diffrent screens.
check what cpu's you have on system and clear and set the once you have in run_even/run_odd.com
The idee is simple:
map (counter_common) byte in_use, &
Loop over every odd place in counter_array and write a value known only localy to program, then loop again check that it's the right value on all odd in array.
While doing the odd thing above, do the same thing for but for even places...
In theory they access diffrent parts of shared memory, you could expect problem if you force them to diffrent CPU's due to cache updates etc, but not of you force them to same cpu..
My test bench is a 2 cpu system, see the "set aff" in run_even/run_odd.
I was runing on same CPU but still they messed up.
$set process /affinity /permanent /set=0 /clear=(1)
Interrupt - I interupted with ctrl-C because EVEN broke
$set process /affinity /permanent /set=0 /clear=(1)
Process 2: Counter_array[ 19892 ]; 3753 <> last_counter: 3764
Even should find same value all the way over the loop.
It expected to find 3764, but found 3753... how come???
It show typical cpu cache problem, and it's repitable for me......
In real life i force those old vax programs to one cpu, but they still time to time "crash" due to the shared array get mixed data.
So is it a Openvms bug? - that at heavy load it migrates processes to other cpu even if it's should not.
Or a compiler bug?
10-31-2012 12:14 PM
Fixed to that the value i add to the array will be ODD if write to ODD array place and EVEN if written to EVEN.
Also i write out the i-2 arrays value and i don't exit program, just let it run the rest off array.
Start value for this process? 1
Process 2: Counter_array[ 12011 ]; 2297 <> last_counter: 2305
Prev: Counter_array[ 12009 ]; 2305
Process 2: Counter_array[ 12013 ]; 2297 <> last_counter: 2305
Prev: Counter_array[ 12011 ]; 2297
Process 2: Counter_array[ 19459 ]; 3737 <> last_counter: 3759
Prev: Counter_array[ 19457 ]; 3759
Process 2: Counter_array[ 19461 ]; 3737 <> last_counter: 3759
Prev: Counter_array[ 19459 ]; 3737
Process 2: Counter_array[ 337 ]; 4333 <> last_counter: 4339
Start value for this process? 0
Process 2: Counter_array[ 30340 ]; 5276 <> last_counter: 5304
Prev: Counter_array[ 30338 ]; 5304
Process 2: Counter_array[ 7234 ]; 7690 <> last_counter: 7714
Prev: Counter_array[ 7232 ]; 7714
Process 2: Counter_array[ 24178 ]; 8220 <> last_counter: 8274
Prev: Counter_array[ 24176 ]; 8274
Process 2: Counter_array[ 25090 ]; 9170 <> last_counter: 9198
Prev: Counter_array[ 25088 ]; 9198
The thing... with this.... is that it always looks like the ODD only see wrong "ODD" value like the value many cycles ago, same goes for EVEN.
When running alone, it OK, but when i start the other process the other seams to read OLD data....
input "Start value for this process";start
last_counter = start
while(1=1) !loop forever
i = start
last_counter = last_counter + 2
while (i < 32000 )
counter_array(i) = last_counter
i = i + 2
i = start
while (i < 32000)
If counter_array(i) <> last_counter then
print "Process 2: Counter_array[";i;"];";counter_array(i);" <> last_counter:";last_counter
print "Prev: Counter_array[";i-2;"];";counter_array(i-2)
i = i + 2
10-31-2012 02:32 PM
I don't think this has anything to do with thread safety, AST reentrancy or stale CPU caches, it's a simple case of unsynchronised access to a shared write access data structure.
There's no point in arguing about theory, or trying to figure out what might be happening. IF you have a shared data structure that will be written by multiple processes, ESPECIALLY when the unit of reference is smaller than a quadword, THEN you need some form of synchronisation. No ifs, ands, buts or maybes. I present your posting as evidence. If you find you need to lock a process to a CPU in order for your algorithm to work, then IT'S WRONG! Locking multiple processes to a single CPU is basically using the CPU itself as a very expensive lock.
What you need to work out is the required granularity of locking and choose an appropriate mechanism. $ENQ is general and fairly simple. Depending on the frequency of access and the performance requirements you may be able to lock the whole table. Typically your writers would $ENQ a null mode lock, then convert to EX to write. Readers would $ENQ a null mode lock and convert to CR when they want to read.
Don't even think about trying to hack around this fundamental principle. Without proper synchronisation your code WILL FAIL somewhere, sometime, and most likely in a manner that's very difficult to reproduce or diagnose.
Remember that the underlying hardware accesses memory in whole quadwords. So, to update a longword, the hardware reads the surrounding quadword, masks in the updated longword and writes back the whole quadword. If two processes attempt to update adjacent longwords simultaneously, only one will win. This is known as "word tearing".
11-01-2012 12:56 AM
Thank you for your reply.
It was most help full.
In the real application we have sync with enq, ast etc. but that don't help.
The thing you said about Quadword sounded like the right thing.
So I tested it
And i start the processes with "0" and "2" (due to i have 4 bytes long)
Then it works perfectly!!!
So now I have a solution:
To encapsule the struct/Record with 1-8 bytes (8 bytes most safe) from start and add a
STRING FILL = 0 at end to get it to even boundary.
Just one question remain... just because i'm a n00b on alpha.
If Set aff realy works and kernel don't have a bug that starts to migrate processes during load, then:
What i don't understand is that both processes sits on same cpu and share same cache.
I would expect that we have only one copy of the memory area in CPU and there for the QUADWORD problem should not exists.
But it do exists, so then the memory area exists in 2 instances in the CPU or SET AFF is broken.......
Any way that discussion is perhaps point less if i can get my hand on some good document over Alpha internal memory/cache hanling. Any one have a link?
11-01-2012 04:57 AM
11-01-2012 11:36 AM
>> If Set aff realy works and kernel don't have a bug that starts to migrate processes during load, then:
Yeah right. You, as a self professed nOOb, think you are going to find a kernel bug on your first day. NFW!
more likely: If it does not work as you expect, then your understanding is not yet adequate.
That was proven by the opening line... this was not at all about thread safety (which is understood to be within a process), but about share data access from multiple processes (kernel threads, I give you that)
You may want to go more aggresive on the alignment/seperation.
256 bits would be my first choice ( 32 bytes )
Is this application garantueed never ever to be ported to Itanium?
11-01-2012 02:25 PM
On Alpha (and Itanium) systems, with more than one writer (process), you need to synchronize access to shared data, no matter how many CPUs your application is running on and what the size of the data and its alignment (or separation) is. The load locked/store conditional instructions can be used to construct such a synchronization, but high level OS or language synchronization methods are recommended .
Your examples show more than one writer.
11-01-2012 04:39 PM
> What i don't understand is that both processes sits on same cpu and share same cache.
> I would expect that we have only one copy of the memory area in CPU and there for
> the QUADWORD problem should not exists.
What you expect has very little to do with a modern CPU and its behaviour related to shared memory access. You DON'T have one copy of memory, nor do you share caches. You can't assume anything about the order of memory accesses. Unless you're designing the microprocessor, or writing operating system kernels, don't try to think at the hardware level.
THE biggest cost in processor speed in a multiprocessing environment is synchronising memory states across multiple processes. Advanced processors like Alpha and Itanium in essence assume that you don't care if an update to memory location X in process A is synchronously visible to process B. This allows many optimisations and significantly increases the speed of the processor. But, it means you need to explicitly create synchronisation points in your instruction stream to make it clear where you care about memory ordering. Look up "Alpha memory barriers" if you're interested in the hardware details.
Most memory accesses are in private memory, so ordering and synchronisation don't matter. Where you are accessing common memory, you need to do proper synchronisation. If you're having trouble with $ENQ, then you must be doing it incorrectly. $ENQ works.
>So now I have a solution:
>To encapsule the struct/Record with 1-8 bytes (8 bytes most safe) from start
>and add a STRING FILL = 0 at end to get it to even boundary.
This is NOT a solution. It's a way of hiding the potential problems from your particular test program.
As I said, don't try to hack around it. Do it right. Shared memory MUST be properly synchronised.
11-02-2012 03:54 AM
Lets take a different approach. As indicated above, there are methods to insure that there is no tearing of the data item. You indicate that the $ENQ did not work for you. Along with John, I too have used $ENQ without problems. Perhaps we can assist in fixing the problems you are experiencing with the use of $ENQ. Can you post a snippit of code that shows how you are using the $ENQ services? Maybe a simple change to that will resolve the problem you are seeing.
11-02-2012 02:28 PM
>ESPECIALLY when the unit of reference is smaller than a quadword, THEN you need some form of synchronisation
The latest C and C++ Standards threading model basically says this type of hardware would be unsupported, unless the software automatically provided this synchronization. Also, the software couldn't optimize by loading multiple fields.
>Remember that the underlying hardware accesses memory in whole quadwords. ... the hardware reads the surrounding quadword, masks in the updated longword and writes back ...
The hardware? Or the multiple generated instructions since there isn't just one?
11-02-2012 03:09 PM
If that's the case then I wonder what happens if counter_array is declared first in counter_common.
11-02-2012 03:23 PM
It sounds odd to me to find a value that is 20+ iterations old. I can't make any sense out of this. Not yet.
11-04-2012 07:52 PM
> The hardware?
Yes, the hardware. It can only read and write memory in whole (aligned) quadwords - it's an optimisation for speed, BUT it means if you want to update a smaller unit of memory, the hardware has to read the containing quadword, mask in the smaller field, then write back the quadword. That's why there's more than one operation involved to do such an update, and why they're vulnerable to word tearing if shared memory is updated with inadequate synchronisation. I think some of the later Alphas added instructions to do smaller granularity writes.
>The latest C and C++ Standards threading model basically says this type of hardware would be
>unsupported, unless the software automatically provided this synchronization
So, they're mandating full memory synchronisation on all instructions just in case some bozo hasn't designed their threaded code correctly? The hardware CAN provide synchronisation, if required, but it's unnecessary and expensive to do it for all operations. That's how Alpha and Itanium work, and it's one of the reasons they're so fast. Please don't penalise my code just because you're too lazy to engineer yours properly!
11-05-2012 12:34 AM
>if you want to update a smaller unit of memory, the hardware has to read the containing quadword, mask in the smaller field, then write back the quadword.
You're saying there IS a byte store instruction but it has to be done in non-atomic steps?
>they're mandating full memory synchronisation on all instructions just in case some bozo hasn't designed their threaded code correctly?
They do require 1, 2 and 4 byte aligned stores that are atomic. The ordering probably requires a special qualifier like volatile.
>That's how Alpha and Itanium work, and it's one of the reasons they're so fast.
Yes but you expect atomic sub-word stores. And now is required.
11-05-2012 04:59 AM
In the beginning, Alpha had only load and store instructions for (in VMS terms) quadwords and longwords (which are 64 and 32 bit entities). With EV56 the architecture was extended for word and byte (which are 16 and 8 bit entities) load and store instructions. Each (load and ) store intruction requires aligned data and is atomic.
That is until EV56 (and code generators using the then new instructions) you didn't expect atomic sub-longword stores.
Even with current compilers the code generator usually generates code compatible for all, including pre-EV56, CPUs. That results in a store byte operation to be non-atomic. You have to tell the compiler to make use of the "new" instructions. For example, the C compiler accepts an /ARCHITECTURE=EV56 switch.