memcheck/tests/bug340392.c currently fails on ppc64 (r15629), i.e. it issues a complaint where it should not. A reduced testcase is: typedef struct { unsigned char c; int i; void *foo; } S; int main (int argc, char **argv) { S x; S *s = &x; s->c = 1; if (s->c == 0 && s->i == 205 && s->foo == s) return 1; return 0; } gcc -O2 test.c -o foo ./vg-in-place -q --expensive-definedness-checks=yes ./foo ==61940== Conditional jump or move depends on uninitialised value(s) ==61940== at 0x100003C0: main (in /home/florian/trunk/foo) This is with gcc 4.7.2 The disassembly of main: 00000000100003a0 <.main>: 100003a0: 39 20 00 01 li r9,1 100003a4: 38 60 00 00 li r3,0 100003a8: 99 21 ff f0 stb r9,-16(r1) 100003ac: 60 42 00 00 ori r2,r2,0 100003b0: e9 21 ff f0 ld r9,-16(r1) 100003b4: 79 29 46 00 rldicl r9,r9,8,24 100003b8: 79 29 c0 02 rotldi r9,r9,56 == rldicl r9,r9,56,0 100003bc: 2f a9 00 cd cmpdi cr7,r9,205 == cmpi cr7,1,r9,205 ERROR here 100003c0: 4c 9e 00 20 bnelr cr7 100003c4: e9 21 ff f8 ld r9,-8(r1) 100003c8: 38 61 ff f0 addi r3,r1,-16 100003cc: 7c 63 4a 78 xor r3,r3,r9 100003d0: 7c 63 00 74 cntlzd r3,r3 100003d4: 78 63 d1 82 rldicl r3,r3,58,6 100003d8: 4e 80 00 20 blr and the trace: ==== SB 1237 (evchecks 7458) [tid 1] 0x100003a0 main /home/florian/trunk/foo+0x100003a0 ------------------------ Front end ------------------------ 0x100003A0: li r9,1 ------ IMark(0x100003A0, 4, 0) ------ t0 = GET:I64(16) t1 = GET:I64(16) t2 = 0x1:I64 PUT(88) = t2 PUT(1296) = 0x100003A4:I64 0x100003A4: li r3,0 ------ IMark(0x100003A4, 4, 0) ------ t3 = GET:I64(16) t4 = GET:I64(16) t5 = 0x0:I64 PUT(40) = t5 PUT(1296) = 0x100003A8:I64 0x100003A8: stb r9,-16(r1) ------ IMark(0x100003A8, 4, 0) ------ t7 = GET:I64(264) t6 = GET:I64(88) t8 = Add64(GET:I64(24),0xFFFFFFFFFFFFFFF0:I64) STbe(t8) = 64to8(t6) PUT(1296) = 0x100003AC:I64 0x100003AC: ori r2,r2,0x0 ------ IMark(0x100003AC, 4, 0) ------ t9 = GET:I64(32) t11 = GET:I64(16) t10 = Or64(t9,0x0:I64) PUT(32) = t10 PUT(1296) = 0x100003B0:I64 0x100003B0: ld r9,-16(r1) ------ IMark(0x100003B0, 4, 0) ------ t12 = Add64(GET:I64(24),0xFFFFFFFFFFFFFFF0:I64) PUT(88) = LDbe:I64(t12) PUT(1296) = 0x100003B4:I64 0x100003B4: rldicl r9,r9,8,24 ------ IMark(0x100003B4, 4, 0) ------ t13 = GET:I64(88) t15 = GET:I64(80) t14 = And64(ITE(CmpNE8(And8(0x8:I8,0x3F:I8),0x0:I8),Or64(Shl64(t13,And8(0x8:I8,0x3F:I8)),Shr64(t13,Sub8(0x40:I8,And8(0x8:I8,0x3F:I8)))),t13),0xFFFFFFFFFF:I64) PUT(88) = t14 PUT(1296) = 0x100003B8:I64 0x100003B8: rldicl r9,r9,56,0 ------ IMark(0x100003B8, 4, 0) ------ t17 = GET:I64(88) t19 = GET:I64(208) t18 = And64(ITE(CmpNE8(And8(0x38:I8,0x3F:I8),0x0:I8),Or64(Shl64(t17,And8(0x38:I8,0x3F:I8)),Shr64(t17,Sub8(0x40:I8,And8(0x38:I8,0x3F:I8)))),t17),0xFFFFFFFFFFFFFFFF:I64) PUT(88) = t18 PUT(1296) = 0x100003BC:I64 0x100003BC: cmpi cr7,1,r9,205 ------ IMark(0x100003BC, 4, 0) ------ PUT(1338) = 64to8(CmpORD64S(GET:I64(88),0xCD:I64)) PUT(1339) = GET:I8(1320) PUT(1296) = 0x100003C0:I64 0x100003C0: bclr 0x4, 0x1E ------ IMark(0x100003C0, 4, 0) ------ t25 = 0xFFFFFFFF:I32 t22 = t25 t27 = And32(8Uto32(GET:I8(1338)),0x2:I32) t26 = Xor32(t27,0x2:I32) t23 = t26 t21 = And32(t23,t22) t24 = And64(GET:I64(1304),0xFFFFFFFFFFFFFFFC:I64) if (CmpEQ32(t21,0x0:I32)) { PUT(1296) = 0x100003C4:I64; exit-Boring } PUT(1296) = t24 PUT(1296) = GET:I64(1296); exit-Return GuestBytes 100003A0 36 39 20 00 01 38 60 00 00 99 21 FF F0 60 42 00 00 E9 21 FF F0 79 29 46 00 79 29 C0 02 2F A9 00 CD 4C 9E 00 20 647EDF28 ------------------------ After tree-building ------------------------ IRSB { t0:I64 t1:I64 t2:I64 t3:I64 t4:I64 t5:I64 t6:I64 t7:I64 t8:I64 t9:I64 t10:I64 t11:I64 t12:I64 t13:I64 t14:I64 t15:I64 t16:I64 t17:I64 t18:I64 t19:I64 t20:I64 t21:I32 t22:I32 t23:I32 t24:I64 t25:I32 t26:I32 t27:I32 t28:I64 t29:I64 t30:I8 t31:I64 t32:I64 t33:I64 t34:I64 t35:I64 t36:I1 t37:I8 t38:I64 t39:I64 t40:I8 t41:I64 t42:I8 t43:I8 t44:I64 t45:I64 t46:I1 t47:I8 t48:I64 t49:I64 t50:I8 t51:I64 t52:I8 t53:I8 t54:I8 t55:I64 t56:I64 t57:I8 t58:I32 t59:I32 t60:I8 t61:I64 t62:I64 t63:I1 t64:I64 t65:I64 t66:I64 t67:I64 t68:I64 t69:I64 t70:I64 t71:I64 t72:I64 t73:I64 t74:I64 t75:I64 t76:I64 t77:I64 t78:I1 t79:I64 t80:I32 t81:I64 t82:I64 t83:I64 t84:I64 t85:I64 t86:I64 t87:I64 t88:I64 t89:I64 t90:I64 t91:I64 t92:I64 t93:I64 t94:I64 t95:I64 t96:I1 t97:I64 t98:I64 t99:I64 t100:I64 t101:I1 t102:I64 t103:I64 t104:I64 t105:I64 t106:I64 t107:I1 t108:I64 t109:I64 t110:I64 t111:I64 t112:I64 t113:I64 t114:I64 t115:I64 t116:I64 t117:I64 t118:I64 t119:I64 t120:I64 t121:I64 t122:I64 t123:I64 t124:I64 t125:I64 t126:I64 t127:I64 t128:I64 t129:I1 t130:I64 t131:I64 t132:I64 t133:I64 t134:I64 t135:I1 t136:I64 t137:I64 t138:I64 t139:I64 t140:I64 t141:I64 t142:I64 t143:I64 t144:I64 t145:I64 t146:I64 t147:I64 t148:I64 t149:I64 t150:I64 t151:I8 t152:I8 t153:I8 t154:I32 t155:I32 t156:I32 t157:I32 t158:I32 t159:I32 t160:I32 t161:I32 t162:I32 t163:I32 t164:I32 t165:I64 t166:I64 t167:I64 t168:I64 t169:I64 t170:I64 t171:I64 t172:I64 t173:I1 t174:I32 t175:I32 t176:I32 t177:I32 t178:I32 t179:I32 t180:I1 t181:I32 t182:I32 t183:I1 t184:I1 t185:I1 t186:I64 ------ IMark(0x100003A0, 4, 0) ------ ------ IMark(0x100003A4, 4, 0) ------ PUT(1752) = 0x0:I64 PUT(40) = 0x0:I64 PUT(1296) = 0x100003A8:I64 ------ IMark(0x100003A8, 4, 0) ------ t65 = GET:I64(1736) t29 = GET:I64(24) t28 = Add64(t29,0xFFFFFFFFFFFFFFF0:I64) DIRTY CmpNEZ64(Or64(t65,Xor64(Add64(And64(t29,Not64(t65)),0xFFFFFFFFFFFFFFF0:I64),Add64(Or64(t29,t65),0xFFFFFFFFFFFFFFF0:I64)))) RdFX-gst(24,8) RdFX-gst(1296,8) ::: MC_(helperc_value_check8_fail_no_o){0x380353c0}() DIRTY 1:I1 RdFX-gst(24,8) RdFX-gst(1296,8) ::: MC_(helperc_STOREV8)[rp=2]{0x38035050}(t28,0x0:I64) STbe(t28) = 0x1:I8 ------ IMark(0x100003AC, 4, 0) ------ PUT(1744) = GET:I64(1744) PUT(32) = GET:I64(32) PUT(1296) = 0x100003B0:I64 ------ IMark(0x100003B0, 4, 0) ------ t31 = Add64(t29,0xFFFFFFFFFFFFFFF0:I64) IR-NoOp t98 = DIRTY 1:I1 RdFX-gst(24,8) RdFX-gst(1296,8) ::: MC_(helperc_LOADV64be)[rp=1]{0x380344c0}(t31) t33 = LDbe:I64(t31) ------ IMark(0x100003B4, 4, 0) ------ t100 = Shl64(t98,0x8:I8) t39 = Shl64(t33,0x8:I8) t106 = Shr64(t98,0x38:I8) t41 = Shr64(t33,0x38:I8) t118 = And64(Or64(t100,t106),And64(Or64(Not64(t39),t100),Or64(Not64(t41),t106))) t38 = Or64(t39,t41) t125 = And64(t118,And64(Or64(t38,t118),0xFFFFFFFFFF:I64)) t34 = And64(t38,0xFFFFFFFFFF:I64) ------ IMark(0x100003B8, 4, 0) ------ t128 = Shl64(t125,0x38:I8) t49 = Shl64(t34,0x38:I8) t134 = Shr64(t125,0x8:I8) t51 = Shr64(t34,0x8:I8) t146 = And64(Or64(t128,t134),And64(Or64(Not64(t49),t128),Or64(Not64(t51),t134))) t48 = Or64(t49,t51) PUT(1800) = t146 PUT(88) = t48 ------ IMark(0x100003BC, 4, 0) ------ t152 = 64to8(And64(CmpwNEZ64(t146),0xE:I64)) t54 = 64to8(CmpORD64S(t48,0xCD:I64)) PUT(3050) = t152 PUT(1338) = t54 PUT(3051) = GET:I8(3032) PUT(1339) = GET:I8(1320) PUT(1296) = 0x100003C0:I64 ------ IMark(0x100003C0, 4, 0) ------ t155 = 8Uto32(t152) t59 = 8Uto32(t54) t161 = And32(t155,And32(Or32(t59,t155),0x2:I32)) t26 = Xor32(And32(t59,0x2:I32),0x2:I32) t165 = GET:I64(3016) t62 = GET:I64(1304) t171 = And64(t165,And64(Or64(t62,t165),0xFFFFFFFFFFFFFFFC:I64)) DIRTY CmpNEZ32(And32(CmpwNEZ32(t161),1Sto32(CmpEQ32(Or32(t161,Not32(t26)),0xFFFFFFFF:I32)))) RdFX-gst(24,8) RdFX-gst(1296,8) ::: MC_(helperc_value_check0_fail_no_o){0x38035300}() if (CmpEQ32(t26,0x0:I32)) { PUT(1296) = 0x100003C4:I64; exit-Boring } DIRTY CmpNEZ64(t171) RdFX-gst(24,8) RdFX-gst(1296,8) ::: MC_(helperc_value_check8_fail_no_o){0x380353c0}() PUT(1296) = And64(t62,0xFFFFFFFFFFFFFFFC:I64); exit-Return } VexExpansionRatio 36 664 184 :10 ==24332== Conditional jump or move depends on uninitialised value(s) ==24332== at 0x100003C0: main (in /home/florian/trunk/foo) ==24332== I'd bet we need a more precise version of CmpORD64S.
(In reply to Florian Krohm from comment #0) > I'd bet we need a more precise version of CmpORD64S. I agree. There is in fact special-case handling already for CmpORD{32,64}{S,U} for the case where the second argument is a constant zero, but in this case it is a constant 0xCD, so it doesn't apply.
Some more analysis: typedef struct { unsigned char c; int i; void *foo; } S; int main (int argc, char **argv) { S x; S *s = &x; s->c = 1; if (s->c == 0 && s->i == 205 && s->foo == s) return 1; return 0; } /* 00000000100003a0 <.main>: 100003a0: 39 20 00 01 li r9,1 100003a4: 38 60 00 00 li r3,0 100003a8: 99 21 ff f0 stb r9,-16(r1) 100003ac: 60 42 00 00 ori r2,r2,0 100003b0: e9 21 ff f0 ld r9,-16(r1) 100003b4: 79 29 46 00 rldicl r9,r9,8,24 100003b8: 79 29 c0 02 rotldi r9,r9,56 100003bc: 2b a9 00 cd cmpldi cr7,r9,205 100003c0: 4c 9e 00 20 bnelr cr7 <-------here 100003c4: e9 21 ff f8 ld r9,-8(r1) 100003c8: 38 61 ff f0 addi r3,r1,-16 100003cc: 7c 63 4a 78 xor r3,r3,r9 100003d0: 7c 63 00 74 cntlzd r3,r3 100003d4: 78 63 d1 82 rldicl r3,r3,58,6 100003d8: 7c 63 07 b4 extsw r3,r3 100003dc: 4e 80 00 20 blr ------ IMark(0x100003B0, 4, 0) ------ t33 = LDbe:I64(Add64(t29,0xFFFFFFFFFFFFFFF0:I64)) ------ IMark(0x100003B4, 4, 0) ------ t34 = And64(Or64(Shl64(t33,0x8:I8),Shr64(t33,0x38:I8)),0xFFFFFFFFFF:I64) ------ IMark(0x100003B8, 4, 0) ------ t48 = Or64(Shl64(t34,0x38:I8),Shr64(t34,0x8:I8)) PUT(88) = t48 ------ IMark(0x100003BC, 4, 0) ------ t54 = 64to8(CmpORD64U(t48,0xCD:I64)) Simplified: t33 = loaded value t34 = Rol64(t33, 8) & 0xFF_FFFF_FFFF t48 = Ror64(t34, 8) t54 = 64to8(CmpORD64U(t48,0xCD:I64)) */ The t33 -> t48 transformation, is (MSB on the left): t33 = A B C D E F G H Rol(t33, 8) = B C D E F G H A t34 = Rol(t33, 8) & 0xFF_FFFF_FFFF = 0 0 0 E F G H A t48 = Ror(t34, 8) = A 0 0 0 E F G H So the two rotates in opposite directions serve only to temporarily put the to-be-zeroed-out area at the top of the word, since rldicl can only make a bit field immediate that is "anchored" at one end of the word or the other. Clearly A is 'c' in the struct and 'E F G H' must be 'i'. After 100003a8, memory at -16(r1) is -16 -15 -14 -13 -12 -11 -10 -9 1 U U U U U U U when loaded (bigendian) we get a 64-bit word, with MSB on the left here: 1 U U U U U U U after masking per the above it is 1 0 0 0 U U U U and are comparing it with 0 0 0 0 0 0 0 205 This seems to me to be a case that would be correctly handled by expensiveCmpEQorNE() (mc_translate.c:983, see comment preceding it). But that applies only to Cmp{EQ,NE}{32,64} and the ppc front end doesn't produce those. Instead it produces CmpORD{U,S}{32,64}. In effect the amd64/x86 (and other, to some extent) front ends together with ir_opt.c succeed in recovering, from the original instruction stream, the actual condition (==, !=, >, >=, <=, <, signed or unsigned) that gcc/clang compiled. In this case 100003bc: 2b a9 00 cd cmpldi cr7,r9,205 100003c0: 4c 9e 00 20 bnelr cr7 <-------here we know that really this is a != comparison, and if the front end could have recovered that, creating a CmpNE64, then it would have been instrumented by expensiveCmpEQorNE and we wouldn't have had the false positive. I guess we could redo doCmpORD so that it performs the "expensive" interpretation for EQ# rather than the "standard interp", per the comments in it. But that would slow down all comparisons, including the 99.99% that don't need this level of accuracy. A different approach would be to redo how the ppc condition codes are translated into IR, and in particular to attempt to recover the actual compared condition, like with the amd64/x86/arm/arm64 front ends. And get rid of CmpORD{32,64}{U,S}. That would in a way be better, since it would mean that ppc code benefitted from the same improvements in definedness tracking in comparisons that all the other architectures do. This isn't straightforward, though -- might be a week of work.
In this example, "cmpld ; bne" translates to essentially this // the cmpld t54 = 64to8(CmpORD64U(t48,0xCD:I64)) // the bne if (CmpEQ32(Xor32(And32(8Uto32(t54),0x2:I32),0x2:I32),0x0:I32)) { PUT(1296) = 0x100003C4:I64; exit-Boring } CmpORD64U produces a value that is either 8, 4 or 2, with 8 meaning "<", 4 meaning ">" and 2 meaning "=". The bne translation then inspects t54 bit 1 (intel encoding) (hence the use of 0x2). This completely obscures the fact that what we really want is simply if (CmpNE64(t48,0xCD:I64) { PUT(1296) = 0x100003C4:I64; exit-Boring } The tricky part is to change the generated IR so that it directly exposes the condition on which the branch is based, yet make it possible for later instructions to recover a correct value for CR0 .. CR7 should they wish to. Maybe we should change the ppc front end to use a thunk-based representation as most of the other front ends do, since that does more or less facilitate all of the above.
On contemplation, it is probably easiest to fix this in ir_opt.c by doing an IR-to-IR transformation pass that, following the example in comment 3, transforms CmpEQ32( select-bit-1-of( CmpORD64U(x,y) ), 0x0 ) to CmpNE64(x,y) where "select-bit-1-of" incorporates all the size-changing and masking gunk (64to8, then 8Uto32, then And32 with 2, then Xor32 with 2). We could be cleverer about the size-changing and masking bits, for example declaring CmpORD64U to produce a I32 typed result, since matching IR patterns is hard work and it's easy to miss cases. With that change the example simplifies to t54 = CmpORD64U(t48,0xCD:I64) if (CmpEQ32(Xor32(And32(t54,0x2:I32),0x2:I32),0x0:I32)) ... and if we get rid of the Xor32 and instead compare the extracted value with 2: t54 = CmpORD64U(t48,0xCD:I64) if (CmpEQ32( And32(t54,0x2:I32), 0x2:I32)) ... That would make the matching problem easier. Furthermore if we normalise And32 and CmpEQ32 nodes so that if just one of the args is a constant than it is on the right, then there are no L-vs-R structural variants to consider. There is a bunch of infrastructure in ir_opt.c, starting at line 5953, for "MSVC specific transformation hacks", that would form a useful starting point.
We are going to have to defer this till after 3.13. It requires some non-trivial reworking of either IR optimisation specific to POWER, or to the POWER condition code handling, or both.
Added a prereq to skip memcheck/tests/bug340392.vgtest on power: commit 94960fae328bc82fb1ea51ffb9273ad5f25936d2 (HEAD -> master) Author: Mark Wielaard <mark@klomp.org> Date: Mon Oct 30 21:41:57 2023 +0100 bug340392 doesn't work on power because of bug352364 Add a prereq to bug340392.vgtest to skip on Power for now.