Summary: | ppc64: --expensive-definedness-checks=yes is not quite working here | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | Florian Krohm <flo2030> |
Component: | memcheck | Assignee: | Julian Seward <jseward> |
Status: | CONFIRMED --- | ||
Severity: | normal | CC: | cel, mark, will_schmidt |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Other | ||
OS: | Linux | ||
See Also: | https://bugs.kde.org/show_bug.cgi?id=340392 | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: |
Description
Florian Krohm
2015-09-06 21:14:56 UTC
(In reply to Florian Krohm from comment #0) > I'd bet we need a more precise version of CmpORD64S. I agree. There is in fact special-case handling already for CmpORD{32,64}{S,U} for the case where the second argument is a constant zero, but in this case it is a constant 0xCD, so it doesn't apply. Some more analysis: typedef struct { unsigned char c; int i; void *foo; } S; int main (int argc, char **argv) { S x; S *s = &x; s->c = 1; if (s->c == 0 && s->i == 205 && s->foo == s) return 1; return 0; } /* 00000000100003a0 <.main>: 100003a0: 39 20 00 01 li r9,1 100003a4: 38 60 00 00 li r3,0 100003a8: 99 21 ff f0 stb r9,-16(r1) 100003ac: 60 42 00 00 ori r2,r2,0 100003b0: e9 21 ff f0 ld r9,-16(r1) 100003b4: 79 29 46 00 rldicl r9,r9,8,24 100003b8: 79 29 c0 02 rotldi r9,r9,56 100003bc: 2b a9 00 cd cmpldi cr7,r9,205 100003c0: 4c 9e 00 20 bnelr cr7 <-------here 100003c4: e9 21 ff f8 ld r9,-8(r1) 100003c8: 38 61 ff f0 addi r3,r1,-16 100003cc: 7c 63 4a 78 xor r3,r3,r9 100003d0: 7c 63 00 74 cntlzd r3,r3 100003d4: 78 63 d1 82 rldicl r3,r3,58,6 100003d8: 7c 63 07 b4 extsw r3,r3 100003dc: 4e 80 00 20 blr ------ IMark(0x100003B0, 4, 0) ------ t33 = LDbe:I64(Add64(t29,0xFFFFFFFFFFFFFFF0:I64)) ------ IMark(0x100003B4, 4, 0) ------ t34 = And64(Or64(Shl64(t33,0x8:I8),Shr64(t33,0x38:I8)),0xFFFFFFFFFF:I64) ------ IMark(0x100003B8, 4, 0) ------ t48 = Or64(Shl64(t34,0x38:I8),Shr64(t34,0x8:I8)) PUT(88) = t48 ------ IMark(0x100003BC, 4, 0) ------ t54 = 64to8(CmpORD64U(t48,0xCD:I64)) Simplified: t33 = loaded value t34 = Rol64(t33, 8) & 0xFF_FFFF_FFFF t48 = Ror64(t34, 8) t54 = 64to8(CmpORD64U(t48,0xCD:I64)) */ The t33 -> t48 transformation, is (MSB on the left): t33 = A B C D E F G H Rol(t33, 8) = B C D E F G H A t34 = Rol(t33, 8) & 0xFF_FFFF_FFFF = 0 0 0 E F G H A t48 = Ror(t34, 8) = A 0 0 0 E F G H So the two rotates in opposite directions serve only to temporarily put the to-be-zeroed-out area at the top of the word, since rldicl can only make a bit field immediate that is "anchored" at one end of the word or the other. Clearly A is 'c' in the struct and 'E F G H' must be 'i'. After 100003a8, memory at -16(r1) is -16 -15 -14 -13 -12 -11 -10 -9 1 U U U U U U U when loaded (bigendian) we get a 64-bit word, with MSB on the left here: 1 U U U U U U U after masking per the above it is 1 0 0 0 U U U U and are comparing it with 0 0 0 0 0 0 0 205 This seems to me to be a case that would be correctly handled by expensiveCmpEQorNE() (mc_translate.c:983, see comment preceding it). But that applies only to Cmp{EQ,NE}{32,64} and the ppc front end doesn't produce those. Instead it produces CmpORD{U,S}{32,64}. In effect the amd64/x86 (and other, to some extent) front ends together with ir_opt.c succeed in recovering, from the original instruction stream, the actual condition (==, !=, >, >=, <=, <, signed or unsigned) that gcc/clang compiled. In this case 100003bc: 2b a9 00 cd cmpldi cr7,r9,205 100003c0: 4c 9e 00 20 bnelr cr7 <-------here we know that really this is a != comparison, and if the front end could have recovered that, creating a CmpNE64, then it would have been instrumented by expensiveCmpEQorNE and we wouldn't have had the false positive. I guess we could redo doCmpORD so that it performs the "expensive" interpretation for EQ# rather than the "standard interp", per the comments in it. But that would slow down all comparisons, including the 99.99% that don't need this level of accuracy. A different approach would be to redo how the ppc condition codes are translated into IR, and in particular to attempt to recover the actual compared condition, like with the amd64/x86/arm/arm64 front ends. And get rid of CmpORD{32,64}{U,S}. That would in a way be better, since it would mean that ppc code benefitted from the same improvements in definedness tracking in comparisons that all the other architectures do. This isn't straightforward, though -- might be a week of work. In this example, "cmpld ; bne" translates to essentially this // the cmpld t54 = 64to8(CmpORD64U(t48,0xCD:I64)) // the bne if (CmpEQ32(Xor32(And32(8Uto32(t54),0x2:I32),0x2:I32),0x0:I32)) { PUT(1296) = 0x100003C4:I64; exit-Boring } CmpORD64U produces a value that is either 8, 4 or 2, with 8 meaning "<", 4 meaning ">" and 2 meaning "=". The bne translation then inspects t54 bit 1 (intel encoding) (hence the use of 0x2). This completely obscures the fact that what we really want is simply if (CmpNE64(t48,0xCD:I64) { PUT(1296) = 0x100003C4:I64; exit-Boring } The tricky part is to change the generated IR so that it directly exposes the condition on which the branch is based, yet make it possible for later instructions to recover a correct value for CR0 .. CR7 should they wish to. Maybe we should change the ppc front end to use a thunk-based representation as most of the other front ends do, since that does more or less facilitate all of the above. On contemplation, it is probably easiest to fix this in ir_opt.c by doing an IR-to-IR transformation pass that, following the example in comment 3, transforms CmpEQ32( select-bit-1-of( CmpORD64U(x,y) ), 0x0 ) to CmpNE64(x,y) where "select-bit-1-of" incorporates all the size-changing and masking gunk (64to8, then 8Uto32, then And32 with 2, then Xor32 with 2). We could be cleverer about the size-changing and masking bits, for example declaring CmpORD64U to produce a I32 typed result, since matching IR patterns is hard work and it's easy to miss cases. With that change the example simplifies to t54 = CmpORD64U(t48,0xCD:I64) if (CmpEQ32(Xor32(And32(t54,0x2:I32),0x2:I32),0x0:I32)) ... and if we get rid of the Xor32 and instead compare the extracted value with 2: t54 = CmpORD64U(t48,0xCD:I64) if (CmpEQ32( And32(t54,0x2:I32), 0x2:I32)) ... That would make the matching problem easier. Furthermore if we normalise And32 and CmpEQ32 nodes so that if just one of the args is a constant than it is on the right, then there are no L-vs-R structural variants to consider. There is a bunch of infrastructure in ir_opt.c, starting at line 5953, for "MSVC specific transformation hacks", that would form a useful starting point. We are going to have to defer this till after 3.13. It requires some non-trivial reworking of either IR optimisation specific to POWER, or to the POWER condition code handling, or both. Added a prereq to skip memcheck/tests/bug340392.vgtest on power: commit 94960fae328bc82fb1ea51ffb9273ad5f25936d2 (HEAD -> master) Author: Mark Wielaard <mark@klomp.org> Date: Mon Oct 30 21:41:57 2023 +0100 bug340392 doesn't work on power because of bug352364 Add a prereq to bug340392.vgtest to skip on Power for now. |