Summary: | Recognize powerpc subfe x, x, x to initialize x to zero or -1 based on CA | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | Mark Wielaard <mark> |
Component: | memcheck | Assignee: | Julian Seward <jseward> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | cel |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Other | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: |
Description
Mark Wielaard
2019-02-07 10:45:23 UTC
--tool=none --trace-flags=10001000 output: ==== SB 1833 (evchecks 112291) [tid 1] 0x10000740 foo(void**, void*, bool) /home /mark/subfe+0x10000740 ------------------------ Front end ------------------------ 0x10000740: subfic r5,r5,0 ------ IMark(0x10000740, 4, 0) ------ t0 = GET:I64(56) t1 = GET:I64(16) t2 = Sub64(0x0:I64,t0) PUT(1323) = And8(32to8(1Uto32(CmpLE64U(t2,0x0:I64))),0x1:I8) PUT(56) = t2 PUT(1296) = 0x10000744:I64 0x10000744: subfe r9,r9,r9 ------ IMark(0x10000744, 4, 0) ------ t3 = GET:I64(88) t4 = GET:I64(88) t6 = 32Uto64(And32(8Uto32(GET:I8(1323)),0x1:I32)) t5 = Add64(Not64(t3),Add64(t4,t6)) PUT(1323) = And8(32to8(1Uto32(32to1(Or32(1Uto32(CmpLT64U(t5,t4)),1 Uto32(32to1(And32(1Uto32(CmpEQ64(t6,0x1:I64)),1Uto32(CmpEQ64(t5,t4))))))))),0x1: I8) PUT(88) = t5 PUT(1296) = 0x10000748:I64 0x10000748: and r4,r4,r9 ------ IMark(0x10000748, 4, 0) ------ t7 = GET:I64(48) t9 = GET:I64(88) t8 = And64(t7,t9) PUT(48) = t8 PUT(1296) = 0x1000074C:I64 0x1000074C: std r4,0(r3) ------ IMark(0x1000074C, 4, 0) ------ t11 = GET:I64(16) t10 = GET:I64(48) t12 = Add64(GET:I64(40),0x0:I64) STle(t12) = t10 PUT(1296) = 0x10000750:I64 0x10000750: blr ------ IMark(0x10000750, 4, 0) ------ t17 = 0xFFFFFFFF:I32 t14 = t17 t18 = 0x1:I32 t15 = t18 t13 = And32(t15,t14) t16 = And64(GET:I64(1304),0xFFFFFFFFFFFFFFFC:I64) if (CmpEQ32(t13,0x0:I32)) { PUT(1296) = 0x10000754:I64; exit-Boring } ====== AbiHint(Sub64(GET:I64(24),0x120:I64), 288, t16) ====== PUT(1296) = t16 PUT(1296) = GET:I64(1296); exit-Return GuestBytes 10000740 20 00 00 A5 20 10 49 29 7D 38 48 84 7C 00 00 83 F8 20 00 80 4E 01733BAE ------------------------ After tree-building ------------------------ IRSB { t0:I64 t1:I64 t2:I64 t3:I64 t4:I64 t5:I64 t6:I64 t7:I64 t8:I64 t9:I64 t10:I64 t11:I64 t12:I64 t13:I32 t14:I32 t15:I32 t16:I64 t17:I32 t18:I32 t19:I32 t20:I8 t21:I8 t22:I32 t23:I1 t24:I64 t25:I32 t26:I32 t27:I8 t28:I64 t29:I64 t30:I64 t31:I8 t32:I8 t33:I32 t34:I1 t35:I32 t36:I32 t37:I1 t38:I32 t39:I1 t40:I32 t41:I32 t42:I1 t43:I32 t44:I1 t45:I64 t46:I64 t47:I64 t48:I64 t49:I1 t50:I64 t51:I64 t52:I64 ------ IMark(0x10000740, 4, 0) ------ t2 = Sub64(0x0:I64,GET:I64(56)) PUT(56) = t2 ------ IMark(0x10000744, 4, 0) ------ t3 = GET:I64(88) t24 = 32Uto64(And32(8Uto32(And8(32to8(1Uto32(CmpLE64U(t2,0x0:I64))),0x1:I8)),0x1:I32)) t28 = Add64(Not64(t3),Add64(t3,t24)) PUT(1323) = And8(32to8(1Uto32(32to1(Or32(1Uto32(CmpLT64U(t28,t3)),1Uto32(32to1(And32(1Uto32(CmpEQ64(t24,0x1:I64)),1Uto32(CmpEQ64(t28,t3))))))))),0x1:I8) PUT(88) = t28 ------ IMark(0x10000748, 4, 0) ------ t8 = And64(GET:I64(48),t28) PUT(48) = t8 PUT(1296) = 0x1000074C:I64 ------ IMark(0x1000074C, 4, 0) ------ STle(GET:I64(40)) = t8 PUT(1296) = 0x10000750:I64 ------ IMark(0x10000750, 4, 0) ------ t47 = And64(GET:I64(1304),0xFFFFFFFFFFFFFFFC:I64) ====== AbiHint(Sub64(GET:I64(24),0x120:I64), 288, t47) ====== PUT(1296) = t47; exit-Return } VexExpansionRatio 20 260 130 :10 My initial idea was to just recognize subfe x,x,x in the ppc frontend, but Jakub and Julian both pointed out this is a more generic issue. This is really making sure we recognize the pattern for extracting the carry flag for any target that have similar instructions (subtract with borrow/carry). An example for x86 would be sbb edx, edx. Which is edx = edx - (edx + CF) => edx = -CF We can fix this by rewriting the subfe translation from: rD = (log not)rA + rB + XER[CA] to rD = rB - rA - (XER[CA] ^ 1) diff --git a/VEX/priv/guest_ppc_toIR.c b/VEX/priv/guest_ppc_toIR.c index e207642..00ae6df 100644 --- a/VEX/priv/guest_ppc_toIR.c +++ b/VEX/priv/guest_ppc_toIR.c @@ -5361,11 +5361,15 @@ static Bool dis_int_arith ( UInt theInstr ) flag_OE ? "o" : "", flag_rC ? ".":"", rD_addr, rA_addr, rB_addr); // rD = (log not)rA + rB + XER[CA] + // ==> + // rD = rB - rA - (XER[CA] ^ 1) assign( old_xer_ca, mkWidenFrom32(ty, getXER_CA_32(), False) ); - assign( rD, binop( mkSzOp(ty, Iop_Add8), - unop( mkSzOp(ty, Iop_Not8), mkexpr(rA)), - binop( mkSzOp(ty, Iop_Add8), - mkexpr(rB), mkexpr(old_xer_ca))) ); + assign( rD, binop( mkSzOp(ty, Iop_Sub8), + binop( mkSzOp(ty, Iop_Sub8), + mkexpr(rB), mkexpr(rA)), + binop(mkSzOp(ty, Iop_Xor8), + mkexpr(old_xer_ca), + mkSzImm(ty, 1))) ); set_XER_CA_CA32( ty, PPCG_FLAG_OP_SUBFE, mkexpr(rD), mkexpr(rA), mkexpr(rB), mkexpr(old_xer_ca) ); This produces: t3 = GET:I64(88) t4 = GET:I64(88) t6 = 32Uto64(And32(8Uto32(GET:I8(1323)),0x1:I32)) t5 = Sub64(Sub64(t4,t3),Xor64(t6,0x1:I64)) Where the Sub64(t4,t3) is recognized as being just zero, so we get: t3 = GET:I64(88) t24 = 32Uto64(And32(8Uto32(And8(32to8(1Uto32(CmpLE64U(t2,0x0:I64))),0x1:I8)),0x1:I32)) t28 = Sub64(0x0:I64,Xor64(t24,0x1:I64)) And nothing relies on the original (potentially) uninitialized register. With this both the reproducer as the larger C++ program using unique_ptr don't produce any errors anymore under memcheck. Julian, Carl, could you take a peek at the proposed patch in comment #3. It would be nicer if we could somehow detect this in the optimizer or memcheck, but I don't think that is easily possible. So I believe this frontend transformation is the best we can do. (In reply to Mark Wielaard from comment #4) > Julian, Carl, could you take a peek at the proposed patch in comment #3. Mark, that looks totally fine to me. If it doesn't break anything (which I think we established on irc at the time you made the patch), then I say land it. It all looks valid to me. I didn't see any issues with the patch. Thanks for the reviews. Pushed now as: commit 256cf43c5eadb28edb45436aca6fda8ee55eb10e Author: Mark Wielaard <mark@klomp.org> Date: Thu Feb 21 17:21:53 2019 +0100 memcheck powerpc subfe x, x, x initializes x to 0 or -1 based on CA GCC might use subfe x, x, x to initialize x to 0 or -1, based on whether the carry flag is set. This happens in some cases when g++ compiles resetting a unique_ptr. The "trick" used by the compiler is that it can AND a pointer with the register x (now 0x0 or 0xffffffff) to set something to NULL or to the given pointer. subfe is implemented as rD = (log not)rA + rB + XER[CA] if we instead implement it as rD = rB - rA - (XER[CA] ^ 1) then memcheck can see that rB and Ra cancel each other out if they are the same. https://bugs.kde.org/show_bug.cgi?id=404054 |