Version: 3.6.0 OS: other I'm seeing Memcheck reports in a very simple piece of code (attached), on the last conditional branch instruction of the hand-written assembly. It seems like some undefined value, probably through APSR shadow state, leak from the part of the assembly above the commented out "b QQQ:" to the part after it. AFAIK, as a result of "tst" in the second part N flag must be completely defined. What makes this even stranger is the fact that the report disappears when I force a new BB by uncommenting the branch and the label (this might also need a --vex-guest-something option). It looks as if saving the state of the conditionals (CC/DEP1/DEP2/NDEP) to APSR and back prevents some incorrect optimization. Disabling IR optimization with a vex option does not help. Reproducible: Didn't try
Created attachment 58807 [details] The source code
The false report appears when the sample code is compiled with -mthumb -march=armv7-a. It does not appear if the code is compiled as -marm. g++ (Ubuntu 4.4.1-4ubuntu8) 4.4.1 Valgrind r11691 (current trunk). $ cat /proc/cpuinfo Processor : ARMv7 Processor rev 3 (v7l) BogoMIPS : 484.21 Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x1 CPU part : 0xc08 CPU revision : 3 Hardware : OMAP3 Beagle Board Revision : 0020 Serial : 0000000000000000
The problem has something to do with IT handling (guaranteedUnconditional). If I add enough nops after the first bne.n so that the ITxxx optimization analysis decides that "tst" is definitely unconditional - the false report goes away. In this case, one "nop.w" is enough. Ok, this seems more or less clear now. Memcheck interprets this ITxxx handling IR as "tst" instruction result having a (possible) dependency on IT condition. I'm not sure I understand Memcheck enough to fix it. Could someone give me a pointer here? This issue can be worked around by compiling as ARM. It can be partially fixed (I think it should be enough on practice) by making ITxxx handling a bit more sophisticated.
I have a different theory. First off, here's a reduced test case that also shows the problem, but only with --vex-guest-max-insns=1. At =2 or above it doesn't complain. extern "C" { int ZZZ(int); } asm( ".type ZZZ, %function" "\n\t" "ZZZ:" "\n\t" ".fnstart" "\n\t" "mov r2, #0" "\n\t" "mov r3, r0" "\n\t" "eors r3, r2" "\n\t" "mov r5, #0" "\n\t" "tst r5, #15" "\n\t" "bne.n ZZZexit" "\n\t" "ZZZexit: \n\t" "bx lr" "\n\t" ".fnend" "\n\t" ); int main(void) { int* undef = new(int); volatile int zz = ZZZ(*undef); delete undef; return 0; } It complains on the bne. Problem is that the bne is guarded like this armg_calculate_condition[mcx=0x9]{0x38131324}( Or32(GET:I32(64),0x10:I32), // (ARMCondNE << 4) | ARMG_CC_OP_blah GET:I32(68), // DEP1 GET:I32(72), // DEP2 GET:I32(76)) // NDEP and memcheck requires DEP1 and DEP2 to be defined, as specified by mcx=0x9. Problem is if you look at the previous instruction tst.w r5, #15, the flag thunk is set like this PUT(64) = 0x5:I32 PUT(68) = t4 PUT(72) = armg_calculate_flag_c[mcx=0x9]{0x381312e8}(t8,t9,t10,t11):I32 PUT(76) = armg_calculate_flag_v[mcx=0x9]{0x381312fc}(t8,t9,t10,t11):I32 and t4 is defined, so DEP1 is defined, but DEP2 is a copy of the old C flag and is undefined. The call to armg_calculate_condition only uses DEP1, because ARMG_CC_OP_blah is in fact ARMG_CC_OP_LOGIC (0x5). If the two instructions are together in one SB, then iropt would have constant propagated that value into "Or32(GET:I32(64),0x10:I32)" producing 0x15, which would then have triggered the spec-helper rule if (isU32(cond_n_op, (ARMCondNE << 4) | ARMG_CC_OP_LOGIC)) { /* NE after LOGIC --> test res != 0 */ return unop(Iop_1Uto32, binop(Iop_CmpNE32, cc_dep1, mkU32(0))); } which would have removed the call to armg_calculate_condition and replaced it with a direct dependency on the DEP1 value only (t4). Since t4 = R5 & 0xF and we have R5 == 0, Memcheck would then not complain. However, since the two instructions are not in the same SB, the constant propagation doesn't happen, so the helper is not specialised, and the dependency on DEP2 is not removed. Fun, isn't it! I can't think how to fix this right now. The x86 and amd64 front ends suffer from similar problems, but they occur only extremely rarely in practice.
Wow, this is tricky. So, usually it works only because of the optimization that propagates a constant and replaces rough and slow armg_calculate_condition by a fast and precise specialization. Maybe we could split armg_calculate_condition into several routines by ARMCond* (which is always known), and somehow rearrange DEP1/DEP2/NDEP in a way that will allow more precise mcx flags for the helpers? This will probably require additional DEP* fields and slow down the whole thing. What if we add an armg_calculate_condition_shadow helper? It would calculate precise shadow value for (Cond, OP) pair. It seems like it would be possible to have it optimized out the same way as armg_calculate_condition is in the typical case, when iropt knows enough about its arguments.
Any updates on this? This happens fairly often on WebKit on x64: http://code.google.com/p/chromium/issues/detail?id=101755