There is IR generation for these opcodes but it ignores this little tidbit: The fetch of the second operand for purposes of loading and the store into the second-operand loca- tion appear to be a block-concurrent interlocked- update reference as observed by other CPUs. A spe- cific-operand-serialization operation is performed. This needs to be modelled correctly.
These instructions need to be fixed with CAS IR. LOAD AND ADD (LAA, LAAG) LOAD AND ADD LOGICAL (LAAL, LAALG) LOAD AND AND (LAN, LANG) LOAD AND EXCLUSIVE OR (LAX, LAXG) LOAD AND OR (LAO, LAOG)
Created attachment 73698 [details] initial patch for LAA insn. using CAS IR
I need to understand this better... This is what is happening: orig = memory[r2 + d2] memory[r2 + d2] = memory[r2 + d2] + r3 r1 = orig Now the tricky bit (from POP): The fetch of the second operand for purposes of loading and the store into the second-operand loca- tion appear to be a block-concurrent interlocked- update reference as observed by other CPUs. ----- The way I understand this is: The fetch from memory and the subsequent store are a single atomic operation. This implies that the addition is part of the atomic operation, because it is the sum that is being stored. Correct ? If so, I don't think this can be done correctly with IRCAS. But before we go further let's first clarify this bit.
> orig = memory[r2 + d2] > memory[r2 + d2] = memory[r2 + d2] + r3 > r1 = orig > If so, I don't think this can be done correctly with IRCAS. Generating IR like this might work: restart: ea = r2 + d2 old = *ea new = old + r3 success = CAS(ea, old -> new) if (!success) goto restart r1 = orig The trick in this case is that you can't modify guest state or memory before the (IR)CAS, since if the CAS fails then guest state isn't the same as when the insn was started. But in this case I think you are OK.
Yes, this would work. However, tool instrumenters would possibly see more memory loads than actually happen. I always thought that was forbidden (which makes sense). Some longish time ago you sent me mail about the guarantees IR makes but I haven't been able to locate it. I seem to remember that you stated that IR is precise when it comes to memory accesses -- that would include frequency of access.
(In reply to comment #5) Well, hmm, I suspect in the x86/amd64/arm front ends there might be cases where we emit more loads than strictly necessary. IIRC there are some nasty hacks to do with BSF/BSR instructions, for one thing. Whilst not ideal, I don't think it is a big deal if these insns get load-counts which overstate the number of loads, providing they are not common. Of course for normal, boring, 99% of loads we want to have the correct numbers, but these load-atomic-ops category does not seem to me to be common. So I think it's fine.
Yeah, it seems reasonable given that those insns will be rare, I suspect. The alternative of dreaming up and implementing yet another IROP for this is certainly much less attractive. We'll put a comment in the code about the possible multiple reads so this won't be forgotten.
Created attachment 75411 [details] laa patch Hi Florian, Please check If my IR code matches julian's given steps.
(In reply to comment #8) > Created attachment 75411 [details] > laa patch > > Hi Florian, > > Please check If my IR code matches julian's given steps. I've applied the patch (VEX r2576 and valgrind r13146) with the following mods: I added tests such that the values of the 1st and 2nd operands are different from the values they will be assigned after the operation. For instance test(0, 0, 0, 0); There is no way to tell that op1 was assigned. But if I change it to test(15, 0, 0, 0); then I can tell. A similar argument holds for op2. I included a test for cc=3. In s390_irgen_LAA: The comment for the condition code was not quite correct: op2 + op3 < 0(overflow) cc = 1 There is no overflow in this case. + iterate_if(binop(Iop_CmpNE32, mkexpr(op2), mkexpr(old_mem))); I changed this to a yield_if. If the CAS failed, then this thread will not make progress, in the sense that next time the CAS will fail again. So let's tell the scheduler, to run another thread for a while.