| Summary: | False memcheck reports on ARM (tst does not reset flags?) | ||
|---|---|---|---|
| Product: | [Developer tools] valgrind | Reporter: | Evgeniy Stepanov <eugeni.stepanov> |
| Component: | memcheck | Assignee: | Julian Seward <jseward> |
| Status: | REPORTED --- | ||
| Severity: | normal | CC: | timurrrr |
| Priority: | NOR | ||
| Version First Reported In: | 3.6.0 | ||
| Target Milestone: | --- | ||
| Platform: | Compiled Sources | ||
| OS: | Other | ||
| Latest Commit: | Version Fixed/Implemented In: | ||
| Sentry Crash Report: | |||
| Attachments: | The source code | ||
|
Description
Evgeniy Stepanov
2011-04-11 21:15:13 UTC
Created attachment 58807 [details]
The source code
The false report appears when the sample code is compiled with -mthumb -march=armv7-a. It does not appear if the code is compiled as -marm. g++ (Ubuntu 4.4.1-4ubuntu8) 4.4.1 Valgrind r11691 (current trunk). $ cat /proc/cpuinfo Processor : ARMv7 Processor rev 3 (v7l) BogoMIPS : 484.21 Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x1 CPU part : 0xc08 CPU revision : 3 Hardware : OMAP3 Beagle Board Revision : 0020 Serial : 0000000000000000 The problem has something to do with IT handling (guaranteedUnconditional). If I add enough nops after the first bne.n so that the ITxxx optimization analysis decides that "tst" is definitely unconditional - the false report goes away. In this case, one "nop.w" is enough. Ok, this seems more or less clear now. Memcheck interprets this ITxxx handling IR as "tst" instruction result having a (possible) dependency on IT condition. I'm not sure I understand Memcheck enough to fix it. Could someone give me a pointer here? This issue can be worked around by compiling as ARM. It can be partially fixed (I think it should be enough on practice) by making ITxxx handling a bit more sophisticated. I have a different theory. First off, here's a reduced test case that
also shows the problem, but only with --vex-guest-max-insns=1. At =2
or above it doesn't complain.
extern "C" {
int ZZZ(int);
}
asm(
".type ZZZ, %function" "\n\t"
"ZZZ:" "\n\t"
".fnstart" "\n\t"
"mov r2, #0" "\n\t"
"mov r3, r0" "\n\t"
"eors r3, r2" "\n\t"
"mov r5, #0" "\n\t"
"tst r5, #15" "\n\t"
"bne.n ZZZexit" "\n\t"
"ZZZexit: \n\t"
"bx lr" "\n\t"
".fnend" "\n\t"
);
int main(void) {
int* undef = new(int);
volatile int zz = ZZZ(*undef);
delete undef;
return 0;
}
It complains on the bne. Problem is that the bne is guarded like this
armg_calculate_condition[mcx=0x9]{0x38131324}(
Or32(GET:I32(64),0x10:I32), // (ARMCondNE << 4) | ARMG_CC_OP_blah
GET:I32(68), // DEP1
GET:I32(72), // DEP2
GET:I32(76)) // NDEP
and memcheck requires DEP1 and DEP2 to be defined, as specified by
mcx=0x9.
Problem is if you look at the previous instruction
tst.w r5, #15, the flag thunk is set like this
PUT(64) = 0x5:I32
PUT(68) = t4
PUT(72) = armg_calculate_flag_c[mcx=0x9]{0x381312e8}(t8,t9,t10,t11):I32
PUT(76) = armg_calculate_flag_v[mcx=0x9]{0x381312fc}(t8,t9,t10,t11):I32
and t4 is defined, so DEP1 is defined, but DEP2 is a copy of the old C
flag and is undefined.
The call to armg_calculate_condition only uses DEP1, because
ARMG_CC_OP_blah is in fact ARMG_CC_OP_LOGIC (0x5). If the two
instructions are together in one SB, then iropt would have constant
propagated that value into "Or32(GET:I32(64),0x10:I32)" producing
0x15, which would then have triggered the spec-helper rule
if (isU32(cond_n_op, (ARMCondNE << 4) | ARMG_CC_OP_LOGIC)) {
/* NE after LOGIC --> test res != 0 */
return unop(Iop_1Uto32,
binop(Iop_CmpNE32, cc_dep1, mkU32(0)));
}
which would have removed the call to armg_calculate_condition and
replaced it with a direct dependency on the DEP1 value only (t4).
Since t4 = R5 & 0xF and we have R5 == 0, Memcheck would then not
complain.
However, since the two instructions are not in the same SB, the
constant propagation doesn't happen, so the helper is not specialised,
and the dependency on DEP2 is not removed.
Fun, isn't it! I can't think how to fix this right now. The x86 and
amd64 front ends suffer from similar problems, but they occur only
extremely rarely in practice.
Wow, this is tricky. So, usually it works only because of the optimization that propagates a constant and replaces rough and slow armg_calculate_condition by a fast and precise specialization. Maybe we could split armg_calculate_condition into several routines by ARMCond* (which is always known), and somehow rearrange DEP1/DEP2/NDEP in a way that will allow more precise mcx flags for the helpers? This will probably require additional DEP* fields and slow down the whole thing. What if we add an armg_calculate_condition_shadow helper? It would calculate precise shadow value for (Cond, OP) pair. It seems like it would be possible to have it optimized out the same way as armg_calculate_condition is in the typical case, when iropt knows enough about its arguments. Any updates on this? This happens fairly often on WebKit on x64: http://code.google.com/p/chromium/issues/detail?id=101755 |