Bug 270709 - False memcheck reports on ARM (tst does not reset flags?)
Summary: False memcheck reports on ARM (tst does not reset flags?)
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.6.0
Platform: Compiled Sources Other
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-04-11 21:15 UTC by Evgeniy Stepanov
Modified: 2011-12-16 11:07 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
The source code (667 bytes, patch)
2011-04-11 21:17 UTC, Evgeniy Stepanov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Evgeniy Stepanov 2011-04-11 21:15:13 UTC
Version:           3.6.0
OS:                other

I'm seeing Memcheck reports in a very simple piece of code (attached), on the last conditional branch instruction of the hand-written assembly. It seems like some undefined value, probably through APSR shadow state, leak from the part of the assembly above the commented out "b QQQ:" to the part after it. AFAIK, as a result of "tst" in the second part N flag must be completely defined.

What makes this even stranger is the fact that the report disappears when I force a new BB by uncommenting the branch and the label (this might also need a --vex-guest-something option). It looks as if saving the state of the conditionals (CC/DEP1/DEP2/NDEP) to APSR and back prevents some incorrect optimization. Disabling IR optimization with a vex option does not help.


Reproducible: Didn't try
Comment 1 Evgeniy Stepanov 2011-04-11 21:17:41 UTC
Created attachment 58807 [details]
The source code
Comment 2 Evgeniy Stepanov 2011-04-12 15:16:41 UTC
The false report appears when the sample code is compiled with -mthumb -march=armv7-a. It does not appear if the code is compiled as -marm.

g++ (Ubuntu 4.4.1-4ubuntu8) 4.4.1
Valgrind r11691 (current trunk).


$ cat /proc/cpuinfo 
Processor	: ARMv7 Processor rev 3 (v7l)
BogoMIPS	: 484.21
Features	: swp half thumb fastmult vfp edsp thumbee neon vfpv3 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x1
CPU part	: 0xc08
CPU revision	: 3

Hardware	: OMAP3 Beagle Board
Revision	: 0020
Serial		: 0000000000000000
Comment 3 Evgeniy Stepanov 2011-04-12 16:42:54 UTC
The problem has something to do with IT handling (guaranteedUnconditional).
If I add enough nops after the first bne.n so that the ITxxx optimization analysis decides that "tst" is definitely unconditional - the false report goes away. In this case, one "nop.w" is enough.

Ok, this seems more or less clear now. Memcheck interprets this ITxxx handling IR as "tst" instruction result having a (possible) dependency on IT condition.

I'm not sure I understand Memcheck enough to fix it. Could someone give me a pointer here?

This issue can be worked around by compiling as ARM.
It can be partially fixed (I think it should be enough on practice) by making ITxxx handling a bit more sophisticated.
Comment 4 Julian Seward 2011-04-14 13:50:16 UTC
I have a different theory.  First off, here's a reduced test case that
also shows the problem, but only with --vex-guest-max-insns=1.  At =2
or above it doesn't complain.

extern "C" {
  int ZZZ(int);
}

asm(
      ".type ZZZ, %function" "\n\t"
      "ZZZ:" "\n\t"
      ".fnstart" "\n\t"

      "mov    r2, #0"   "\n\t"
      "mov    r3, r0"   "\n\t"
      "eors   r3, r2"   "\n\t"
      "mov    r5, #0"   "\n\t"
      "tst    r5, #15"  "\n\t"
      "bne.n  ZZZexit"  "\n\t"

      "ZZZexit: \n\t"
      "bx      lr" "\n\t"
      ".fnend" "\n\t"
    );

int main(void) {
  int* undef = new(int);
  volatile int zz = ZZZ(*undef);
  delete undef;
  return 0;
}

It complains on the bne.  Problem is that the bne is guarded like this

armg_calculate_condition[mcx=0x9]{0x38131324}(
    Or32(GET:I32(64),0x10:I32), // (ARMCondNE << 4) | ARMG_CC_OP_blah
    GET:I32(68),                // DEP1
    GET:I32(72),                // DEP2
    GET:I32(76))                // NDEP

and memcheck requires DEP1 and DEP2 to be defined, as specified by
mcx=0x9.

Problem is if you look at the previous instruction
tst.w r5, #15,   the flag thunk is set like this

   PUT(64) = 0x5:I32
   PUT(68) = t4
   PUT(72) = armg_calculate_flag_c[mcx=0x9]{0x381312e8}(t8,t9,t10,t11):I32
   PUT(76) = armg_calculate_flag_v[mcx=0x9]{0x381312fc}(t8,t9,t10,t11):I32

and t4 is defined, so DEP1 is defined, but DEP2 is a copy of the old C
flag and is undefined.

The call to armg_calculate_condition only uses DEP1, because
ARMG_CC_OP_blah is in fact ARMG_CC_OP_LOGIC (0x5).  If the two
instructions are together in one SB, then iropt would have constant
propagated that value into "Or32(GET:I32(64),0x10:I32)" producing
0x15, which would then have triggered the spec-helper rule

      if (isU32(cond_n_op, (ARMCondNE << 4) | ARMG_CC_OP_LOGIC)) {
         /* NE after LOGIC --> test res != 0 */
         return unop(Iop_1Uto32,
                     binop(Iop_CmpNE32, cc_dep1, mkU32(0)));
      }

which would have removed the call to armg_calculate_condition and
replaced it with a direct dependency on the DEP1 value only (t4).

Since t4 = R5 & 0xF and we have R5 == 0, Memcheck would then not
complain.

However, since the two instructions are not in the same SB, the
constant propagation doesn't happen, so the helper is not specialised,
and the dependency on DEP2 is not removed.

Fun, isn't it!  I can't think how to fix this right now.  The x86 and
amd64 front ends suffer from similar problems, but they occur only
extremely rarely in practice.
Comment 5 Evgeniy Stepanov 2011-04-14 20:29:04 UTC
Wow, this is tricky. So, usually it works only because of the optimization that propagates a constant and replaces rough and slow armg_calculate_condition by a fast and precise specialization.

Maybe we could split armg_calculate_condition into several routines by ARMCond* (which is always known), and somehow rearrange DEP1/DEP2/NDEP in a way that will allow more precise mcx flags for the helpers? This will probably require additional DEP* fields and slow down the whole thing.

What if we add an armg_calculate_condition_shadow helper? It would calculate precise shadow value for (Cond, OP) pair. It seems like it would be possible to have it optimized out the same way as armg_calculate_condition is in the typical case, when iropt knows enough about its arguments.
Comment 6 Timur Iskhodzhanov 2011-12-16 11:07:42 UTC
Any updates on this?
This happens fairly often on WebKit on x64:
http://code.google.com/p/chromium/issues/detail?id=101755