Bug 404054 - Recognize powerpc subfe x, x, x to initialize x to zero or -1 based on CA
Summary: Recognize powerpc subfe x, x, x to initialize x to zero or -1 based on CA
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-07 10:45 UTC by Mark Wielaard
Modified: 2019-02-21 16:31 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Wielaard 2019-02-07 10:45:23 UTC
Take the following program (this can easily be replicated on gcc112 in the gcc compile farm, a power8 ppc64le setup):

__attribute__((noipa)) void foo (void **q, void *p, bool x)
{
  *q = x ? p : 0;
}

__attribute__((noipa)) int bar (void) { return 0; }

int
main ()
{
  void *q;
  void *z = __builtin_malloc (64);
  if (z == 0)
    return 0;
  register void *p __asm ("r9");
  /* Technically load of uninitialized data, but the data is ignored.  */
  asm volatile ("" : "=r" (p) : "0" (*(void **) z));
  foo (&q, z, true);
  int ret = 1;
  if (q != 0)
    ret = bar ();
  __builtin_free (z);
  return ret;
}

$ /opt/cfarm/gcc-latest/bin/g++ --version
g++ (GCC) 8.2.0

$ /opt/cfarm/gcc-latest/bin/g++ -O2 -g -o subfe subfe.cpp

$ objdump -d subfe

0000000010000740 <_Z3fooPPvS_b>:
    10000740:   00 00 a5 20     subfic  r5,r5,0
    10000744:   10 49 29 7d     subfe   r9,r9,r9
    10000748:   38 48 84 7c     and     r4,r4,r9
    1000074c:   00 00 83 f8     std     r4,0(r3)
    10000750:   20 00 80 4e     blr
    10000754:   00 00 00 00     .long 0x0
    10000758:   00 09 00 00     .long 0x900
    1000075c:   00 00 00 00     .long 0x0

$ valgrind/vg-in-place -q ./subfe
==133088== Conditional jump or move depends on uninitialised value(s)
==133088==    at 0x10000584: main (subfe.cpp:20)
==133088== 

This is the if (q != 0) check.

subfic = Subtract From Immediate Carrying (setting CA)
subfe = Subtract From Extended (using CA)

So first it makes sure to set the carry flag depending on x.
Then it does a subtract of r9 with itself, plus the carry flag...
so now r9 is either zero or -1
then it ands p with r9
and assigns that to *q

But memcheck doesn't recognize that subfe r9,r9,r9 initializes r9 to zero or -1 (based on the CA set by subfic r5,r5,0).

A similar pattern was originally found in how g++ compiles resetting a unique_ptr.

$ valgrind/vg-in-place --vex-guest-chase-thresh=0 --trace-flags=11100000 --trace-notbelow=1710 ./subfe

==== SB 1710 (evchecks 112394) [tid 1] 0x10000740 foo(void**, void*, bool) /home/mark/subfe+0x10000740

------------------------ Front end ------------------------

        0x10000740:  subfic r5,r5,0

              ------ IMark(0x10000740, 4, 0) ------
              t0 = GET:I64(56)
              t1 = GET:I64(16)
              t2 = Sub64(0x0:I64,t0)
              PUT(1323) = And8(32to8(1Uto32(CmpLE64U(t2,0x0:I64))),0x1:I8)
              PUT(56) = t2
              PUT(1296) = 0x10000744:I64

        0x10000744:  subfe r9,r9,r9

              ------ IMark(0x10000744, 4, 0) ------
              t3 = GET:I64(88)
              t4 = GET:I64(88)
              t6 = 32Uto64(And32(8Uto32(GET:I8(1323)),0x1:I32))
              t5 = Add64(Not64(t3),Add64(t4,t6))
              PUT(1323) = And8(32to8(1Uto32(32to1(Or32(1Uto32(CmpLT64U(t5,t4)),1Uto32(32to1(And32(1Uto32(CmpEQ64(t6,0x1:I64)),1Uto32(CmpEQ64(t5,t4))))))))),0x1:I8)
              PUT(88) = t5
              PUT(1296) = 0x10000748:I64

        0x10000748:  and r4,r4,r9

              ------ IMark(0x10000748, 4, 0) ------
              t7 = GET:I64(48)
              t9 = GET:I64(88)
              t8 = And64(t7,t9)
              PUT(48) = t8
              PUT(1296) = 0x1000074C:I64

        0x1000074C:  std r4,0(r3)

              ------ IMark(0x1000074C, 4, 0) ------
              t11 = GET:I64(16)
              t10 = GET:I64(48)
              t12 = Add64(GET:I64(40),0x0:I64)
              STle(t12) = t10
              PUT(1296) = 0x10000750:I64

        0x10000750:  blr

              ------ IMark(0x10000750, 4, 0) ------
              t17 = 0xFFFFFFFF:I32
              t14 = t17
              t18 = 0x1:I32
              t15 = t18
              t13 = And32(t15,t14)
              t16 = And64(GET:I64(1304),0xFFFFFFFFFFFFFFFC:I64)
              if (CmpEQ32(t13,0x0:I32)) { PUT(1296) = 0x10000754:I64; exit-Boring } 
              ====== AbiHint(Sub64(GET:I64(24),0x120:I64), 288, t16) ======
              PUT(1296) = t16
              PUT(1296) = GET:I64(1296); exit-Return

GuestBytes 10000740 20  00 00 A5 20 10 49 29 7D 38 48 84 7C 00 00 83 F8 20 00 80 4E  01733BAE


------------------------ After pre-instr IR optimisation ------------------------

IRSB {
   t0:I64   t1:I64   t2:I64   t3:I64   t4:I64   t5:I64   t6:I64   t7:I64
   t8:I64   t9:I64   t10:I64   t11:I64   t12:I64   t13:I32   t14:I32   t15:I32
   t16:I64   t17:I32   t18:I32   t19:I32   t20:I8   t21:I8   t22:I32   t23:I1
   t24:I64   t25:I32   t26:I32   t27:I8   t28:I64   t29:I64   t30:I64   t31:I8
   t32:I8   t33:I32   t34:I1   t35:I32   t36:I32   t37:I1   t38:I32   t39:I1
   t40:I32   t41:I32   t42:I1   t43:I32   t44:I1   t45:I64   t46:I64   t47:I64
   t48:I64   t49:I1   t50:I64   t51:I64   t52:I64   

   ------ IMark(0x10000740, 4, 0) ------
   t0 = GET:I64(56)
   IR-NoOp
   t2 = Sub64(0x0:I64,t0)
   t23 = CmpLE64U(t2,0x0:I64)
   t22 = 1Uto32(t23)
   t21 = 32to8(t22)
   t20 = And8(t21,0x1:I8)
   PUT(56) = t2
   ------ IMark(0x10000744, 4, 0) ------
   t3 = GET:I64(88)
   t26 = 8Uto32(t20)
   t25 = And32(t26,0x1:I32)
   t24 = 32Uto64(t25)
   t29 = Not64(t3)
   t30 = Add64(t3,t24)
   t28 = Add64(t29,t30)
   t37 = CmpLT64U(t28,t3)
   t36 = 1Uto32(t37)
   t42 = CmpEQ64(t24,0x1:I64)
   t41 = 1Uto32(t42)
   t44 = CmpEQ64(t28,t3)
   t43 = 1Uto32(t44)
   t40 = And32(t41,t43)
   t39 = 32to1(t40)
   t38 = 1Uto32(t39)
   t35 = Or32(t36,t38)
   t34 = 32to1(t35)
   t33 = 1Uto32(t34)
   t32 = 32to8(t33)
   t31 = And8(t32,0x1:I8)
   PUT(1323) = t31
   PUT(88) = t28
   ------ IMark(0x10000748, 4, 0) ------
   t7 = GET:I64(48)
   t8 = And64(t7,t28)
   PUT(48) = t8
   PUT(1296) = 0x1000074C:I64
   ------ IMark(0x1000074C, 4, 0) ------
   t46 = GET:I64(40)
   STle(t46) = t8
   PUT(1296) = 0x10000750:I64
   ------ IMark(0x10000750, 4, 0) ------
   t48 = GET:I64(1304)
   t47 = And64(t48,0xFFFFFFFFFFFFFFFC:I64)
   t51 = GET:I64(24)
   t50 = Sub64(t51,0x120:I64)
   ====== AbiHint(t50, 288, t47) ======
   PUT(1296) = t47; exit-Return
}


------------------------ After instrumentation ------------------------

IRSB {
   t0:I64   t1:I64   t2:I64   t3:I64   t4:I64   t5:I64   t6:I64   t7:I64
   t8:I64   t9:I64   t10:I64   t11:I64   t12:I64   t13:I32   t14:I32   t15:I32
   t16:I64   t17:I32   t18:I32   t19:I32   t20:I8   t21:I8   t22:I32   t23:I1
   t24:I64   t25:I32   t26:I32   t27:I8   t28:I64   t29:I64   t30:I64   t31:I8
   t32:I8   t33:I32   t34:I1   t35:I32   t36:I32   t37:I1   t38:I32   t39:I1
   t40:I32   t41:I32   t42:I1   t43:I32   t44:I1   t45:I64   t46:I64   t47:I64
   t48:I64   t49:I1   t50:I64   t51:I64   t52:I64   t53:I64   t54:I64   t55:I64
   t56:I64   t57:I1   t58:I64   t59:I1   t60:I32   t61:I32   t62:I8   t63:I8
   t64:I8   t65:I8   t66:I8   t67:I8   t68:I8   t69:I8   t70:I8   t71:I64
   t72:I32   t73:I32   t74:I32   t75:I32   t76:I32   t77:I32   t78:I32   t79:I32
   t80:I32   t81:I64   t82:I64   t83:I64   t84:I64   t85:I64   t86:I64   t87:I64
   t88:I64   t89:I64   t90:I1   t91:I64   t92:I1   t93:I32   t94:I32   t95:I1
   t96:I64   t97:I64   t98:I64   t99:I64   t100:I64   t101:I64   t102:I64   t103:I64
   t104:I1   t105:I32   t106:I32   t107:I1   t108:I64   t109:I64   t110:I64   t111:I64
   t112:I64   t113:I64   t114:I64   t115:I64   t116:I1   t117:I32   t118:I32   t119:I32
   t120:I32   t121:I32   t122:I32   t123:I32   t124:I32   t125:I32   t126:I1   t127:I1
   t128:I32   t129:I32   t130:I32   t131:I32   t132:I32   t133:I32   t134:I32   t135:I32
   t136:I32   t137:I32   t138:I32   t139:I1   t140:I1   t141:I32   t142:I32   t143:I8
   t144:I8   t145:I8   t146:I8   t147:I8   t148:I8   t149:I8   t150:I8   t151:I8
   t152:I64   t153:I64   t154:I64   t155:I64   t156:I64   t157:I64   t158:I64   t159:I64
   t160:I64   t161:I1   t162:I64   t163:I64   t164:I64   t165:I64   t166:I64   t167:I64
   t168:I64   t169:I64   t170:I64   t171:I64   t172:I64   t173:I64   t174:I64   t175:I1
   t176:I64   

   ------ IMark(0x10000740, 4, 0) ------
   t53 = GET:I64(1800)
   t0 = GET:I64(56)
   IR-NoOp
   t55 = Or64(0x0:I64,t53)
   t56 = Left64(t55)
   t54 = t56
   t2 = Sub64(0x0:I64,t0)
   t58 = Or64(t54,0x0:I64)
   t59 = CmpNEZ64(t58)
   t57 = t59
   t23 = CmpLE64U(t2,0x0:I64)
   t61 = 1Uto32(t57)
   t60 = t61
   t22 = 1Uto32(t23)
   t63 = 32to8(t60)
   t62 = t63
   t21 = 32to8(t22)
   t65 = Or8(t62,0x0:I8)
   t66 = Or8(t21,t62)
   t67 = Or8(0x1:I8,0x0:I8)
   t68 = And8(t66,t67)
   t69 = And8(t65,t68)
   t70 = t69
   t64 = t70
   t20 = And8(t21,0x1:I8)
   PUT(1800) = t54
   PUT(56) = t2
   ------ IMark(0x10000744, 4, 0) ------
   t71 = GET:I64(1832)
   t3 = GET:I64(88)
   t73 = 8Uto32(t64)
   t72 = t73
   t26 = 8Uto32(t20)
   t75 = Or32(t72,0x0:I32)
   t76 = Or32(t26,t72)
   t77 = Or32(0x1:I32,0x0:I32)
   t78 = And32(t76,t77)
   t79 = And32(t75,t78)
   t80 = t79
   t74 = t80
   t25 = And32(t26,0x1:I32)
   t82 = 32Uto64(t74)
   t81 = t82
   t24 = 32Uto64(t25)
   t83 = t71
   t29 = Not64(t3)
   t85 = Or64(t71,t81)
   t86 = Left64(t85)
   t84 = t86
   t30 = Add64(t3,t24)
   t88 = Or64(t83,t84)
   t89 = Left64(t88)
   t87 = t89
   t28 = Add64(t29,t30)
   t91 = Or64(t87,t71)
   t92 = CmpNEZ64(t91)
   t90 = t92
   t37 = CmpLT64U(t28,t3)
   t94 = 1Uto32(t90)
   t93 = t94
   t36 = 1Uto32(t37)
   t96 = Or64(t81,0x0:I64)
   t97 = Xor64(t24,0x1:I64)
   t98 = Not64(t97)
   t99 = Or64(t96,t98)
   t100 = Shr64(t99,0x1:I8)
   t101 = Sub64(t99,t100)
   t102 = Sar64(t101,0x3F:I8)
   t103 = And64(t96,t102)
   t104 = CmpNEZ64(t103)
   t95 = t104
   t42 = CmpEQ64(t24,0x1:I64)
   t106 = 1Uto32(t95)
   t105 = t106
   t41 = 1Uto32(t42)
   t108 = Or64(t87,t71)
   t109 = Xor64(t28,t3)
   t110 = Not64(t109)
   t111 = Or64(t108,t110)
   t112 = Shr64(t111,0x1:I8)
   t113 = Sub64(t111,t112)
   t114 = Sar64(t113,0x3F:I8)
   t115 = And64(t108,t114)
   t116 = CmpNEZ64(t115)
   t107 = t116
   t44 = CmpEQ64(t28,t3)
   t118 = 1Uto32(t107)
   t117 = t118
   t43 = 1Uto32(t44)
   t120 = Or32(t105,t117)
   t121 = Or32(t41,t105)
   t122 = Or32(t43,t117)
   t123 = And32(t121,t122)
   t124 = And32(t120,t123)
   t125 = t124
   t119 = t125
   t40 = And32(t41,t43)
   t127 = 32to1(t119)
   t126 = t127
   t39 = 32to1(t40)
   t129 = 1Uto32(t126)
   t128 = t129
   t38 = 1Uto32(t39)
   t131 = Or32(t93,t128)
   t132 = Not32(t36)
   t133 = Or32(t132,t93)
   t134 = Not32(t38)
   t135 = Or32(t134,t128)
   t136 = And32(t133,t135)
   t137 = And32(t131,t136)
   t138 = t137
   t130 = t138
   t35 = Or32(t36,t38)
   t140 = 32to1(t130)
   t139 = t140
   t34 = 32to1(t35)
   t142 = 1Uto32(t139)
   t141 = t142
   t33 = 1Uto32(t34)
   t144 = 32to8(t141)
   t143 = t144
   t32 = 32to8(t33)
   t146 = Or8(t143,0x0:I8)
   t147 = Or8(t32,t143)
   t148 = Or8(0x1:I8,0x0:I8)
   t149 = And8(t147,t148)
   t150 = And8(t146,t149)
   t151 = t150
   t145 = t151
   t31 = And8(t32,0x1:I8)
   PUT(3067) = t145
   PUT(1323) = t31
   PUT(1832) = t87
   PUT(88) = t28
   ------ IMark(0x10000748, 4, 0) ------
   t152 = GET:I64(1792)
   t7 = GET:I64(48)
   t154 = Or64(t152,t87)
   t155 = Or64(t7,t152)
   t156 = Or64(t28,t87)
   t157 = And64(t155,t156)
   t158 = And64(t154,t157)
   t159 = t158
   t153 = t159
   t8 = And64(t7,t28)
   PUT(1792) = t153
   PUT(48) = t8
   PUT(1296) = 0x1000074C:I64
   ------ IMark(0x1000074C, 4, 0) ------
   t160 = GET:I64(1784)
   t46 = GET:I64(40)
   t161 = CmpNEZ64(t160)
   DIRTY t161 RdFX-gst(24,8) RdFX-gst(1296,8) ::: MC_(helperc_value_check8_fail_no_o){0x58057890}()
   t162 = 0x0:I64
   DIRTY 1:I1 RdFX-gst(24,8) RdFX-gst(1296,8) ::: MC_(helperc_STOREV64le)[rp=1]{0x58056a80}(t46,t153)
   STle(t46) = t8
   PUT(1296) = 0x10000750:I64
   ------ IMark(0x10000750, 4, 0) ------
   t163 = GET:I64(3048)
   t48 = GET:I64(1304)
   t165 = Or64(t163,0x0:I64)
   t166 = Or64(t48,t163)
   t167 = Or64(0xFFFFFFFFFFFFFFFC:I64,0x0:I64)
   t168 = And64(t166,t167)
   t169 = And64(t165,t168)
   t170 = t169
   t164 = t170
   t47 = And64(t48,0xFFFFFFFFFFFFFFFC:I64)
   t171 = GET:I64(1768)
   t51 = GET:I64(24)
   t173 = Or64(t171,0x0:I64)
   t174 = Left64(t173)
   t172 = t174
   t50 = Sub64(t51,0x120:I64)
   DIRTY 1:I1 ::: MC_(helperc_MAKE_STACK_UNINIT_no_o)[rp=2]{0x58055e30}(t50,0x120:I64)
   ====== AbiHint(t50, 288, t47) ======
   t175 = CmpNEZ64(t164)
   DIRTY t175 RdFX-gst(24,8) RdFX-gst(1296,8) ::: MC_(helperc_value_check8_fail_no_o){0x58057890}()
   t176 = 0x0:I64
   PUT(1296) = t47; exit-Return
}

VexExpansionRatio 20 728   364 :10
Comment 1 Mark Wielaard 2019-02-07 10:57:21 UTC
 --tool=none --trace-flags=10001000 output:

==== SB 1833 (evchecks 112291) [tid 1] 0x10000740 foo(void**, void*, bool) /home
/mark/subfe+0x10000740

------------------------ Front end ------------------------

        0x10000740:  subfic r5,r5,0

              ------ IMark(0x10000740, 4, 0) ------
              t0 = GET:I64(56)
              t1 = GET:I64(16)
              t2 = Sub64(0x0:I64,t0)
              PUT(1323) = And8(32to8(1Uto32(CmpLE64U(t2,0x0:I64))),0x1:I8)
              PUT(56) = t2
              PUT(1296) = 0x10000744:I64

        0x10000744:  subfe r9,r9,r9

              ------ IMark(0x10000744, 4, 0) ------
              t3 = GET:I64(88)
              t4 = GET:I64(88)
              t6 = 32Uto64(And32(8Uto32(GET:I8(1323)),0x1:I32))
              t5 = Add64(Not64(t3),Add64(t4,t6))
              PUT(1323) = And8(32to8(1Uto32(32to1(Or32(1Uto32(CmpLT64U(t5,t4)),1
Uto32(32to1(And32(1Uto32(CmpEQ64(t6,0x1:I64)),1Uto32(CmpEQ64(t5,t4))))))))),0x1:
I8)
              PUT(88) = t5
              PUT(1296) = 0x10000748:I64

        0x10000748:  and r4,r4,r9

              ------ IMark(0x10000748, 4, 0) ------
              t7 = GET:I64(48)
              t9 = GET:I64(88)
              t8 = And64(t7,t9)
              PUT(48) = t8
              PUT(1296) = 0x1000074C:I64

        0x1000074C:  std r4,0(r3)

              ------ IMark(0x1000074C, 4, 0) ------
              t11 = GET:I64(16)
              t10 = GET:I64(48)
              t12 = Add64(GET:I64(40),0x0:I64)
              STle(t12) = t10
              PUT(1296) = 0x10000750:I64

        0x10000750:  blr

              ------ IMark(0x10000750, 4, 0) ------
              t17 = 0xFFFFFFFF:I32
              t14 = t17
              t18 = 0x1:I32
              t15 = t18
              t13 = And32(t15,t14)
              t16 = And64(GET:I64(1304),0xFFFFFFFFFFFFFFFC:I64)
              if (CmpEQ32(t13,0x0:I32)) { PUT(1296) = 0x10000754:I64; exit-Boring } 
              ====== AbiHint(Sub64(GET:I64(24),0x120:I64), 288, t16) ======
              PUT(1296) = t16
              PUT(1296) = GET:I64(1296); exit-Return

GuestBytes 10000740 20  00 00 A5 20 10 49 29 7D 38 48 84 7C 00 00 83 F8 20 00 80 4E  01733BAE

------------------------  After tree-building ------------------------

IRSB {
   t0:I64   t1:I64   t2:I64   t3:I64   t4:I64   t5:I64   t6:I64   t7:I64
   t8:I64   t9:I64   t10:I64   t11:I64   t12:I64   t13:I32   t14:I32   t15:I32
   t16:I64   t17:I32   t18:I32   t19:I32   t20:I8   t21:I8   t22:I32   t23:I1
   t24:I64   t25:I32   t26:I32   t27:I8   t28:I64   t29:I64   t30:I64   t31:I8
   t32:I8   t33:I32   t34:I1   t35:I32   t36:I32   t37:I1   t38:I32   t39:I1
   t40:I32   t41:I32   t42:I1   t43:I32   t44:I1   t45:I64   t46:I64   t47:I64
   t48:I64   t49:I1   t50:I64   t51:I64   t52:I64   

   ------ IMark(0x10000740, 4, 0) ------
   t2 = Sub64(0x0:I64,GET:I64(56))
   PUT(56) = t2
   ------ IMark(0x10000744, 4, 0) ------
   t3 = GET:I64(88)
   t24 = 32Uto64(And32(8Uto32(And8(32to8(1Uto32(CmpLE64U(t2,0x0:I64))),0x1:I8)),0x1:I32))
   t28 = Add64(Not64(t3),Add64(t3,t24))
   PUT(1323) = And8(32to8(1Uto32(32to1(Or32(1Uto32(CmpLT64U(t28,t3)),1Uto32(32to1(And32(1Uto32(CmpEQ64(t24,0x1:I64)),1Uto32(CmpEQ64(t28,t3))))))))),0x1:I8)
   PUT(88) = t28
   ------ IMark(0x10000748, 4, 0) ------
   t8 = And64(GET:I64(48),t28)
   PUT(48) = t8
   PUT(1296) = 0x1000074C:I64
   ------ IMark(0x1000074C, 4, 0) ------
   STle(GET:I64(40)) = t8
   PUT(1296) = 0x10000750:I64
   ------ IMark(0x10000750, 4, 0) ------
   t47 = And64(GET:I64(1304),0xFFFFFFFFFFFFFFFC:I64)
   ====== AbiHint(Sub64(GET:I64(24),0x120:I64), 288, t47) ======
   PUT(1296) = t47; exit-Return
}

VexExpansionRatio 20 260   130 :10
Comment 2 Mark Wielaard 2019-02-07 11:27:30 UTC
My initial idea was to just recognize subfe x,x,x in the ppc frontend, but Jakub and Julian both pointed out this is a more generic issue.

This is really making sure we recognize the pattern for extracting the carry flag for any target that have similar instructions (subtract with borrow/carry).

An example for x86 would be sbb edx, edx.
Which is edx = edx - (edx + CF) => edx = -CF
Comment 3 Mark Wielaard 2019-02-07 13:09:57 UTC
We can fix this by rewriting the subfe translation from:

rD = (log not)rA + rB + XER[CA]

 to

rD = rB - rA - (XER[CA] ^ 1)

diff --git a/VEX/priv/guest_ppc_toIR.c b/VEX/priv/guest_ppc_toIR.c
index e207642..00ae6df 100644
--- a/VEX/priv/guest_ppc_toIR.c
+++ b/VEX/priv/guest_ppc_toIR.c
@@ -5361,11 +5361,15 @@ static Bool dis_int_arith ( UInt theInstr )
              flag_OE ? "o" : "", flag_rC ? ".":"",
              rD_addr, rA_addr, rB_addr);
          // rD = (log not)rA + rB + XER[CA]
+         //    ==>
+         // rD = rB - rA - (XER[CA] ^ 1)
          assign( old_xer_ca, mkWidenFrom32(ty, getXER_CA_32(), False) );
-         assign( rD, binop( mkSzOp(ty, Iop_Add8),
-                            unop( mkSzOp(ty, Iop_Not8), mkexpr(rA)),
-                            binop( mkSzOp(ty, Iop_Add8),
-                                   mkexpr(rB), mkexpr(old_xer_ca))) );
+         assign( rD, binop( mkSzOp(ty, Iop_Sub8),
+                            binop( mkSzOp(ty, Iop_Sub8),
+                                   mkexpr(rB), mkexpr(rA)),
+                            binop(mkSzOp(ty, Iop_Xor8),
+                                  mkexpr(old_xer_ca),
+                                  mkSzImm(ty, 1))) );
          set_XER_CA_CA32( ty, PPCG_FLAG_OP_SUBFE,
                           mkexpr(rD), mkexpr(rA), mkexpr(rB),
                           mkexpr(old_xer_ca) );

This produces:

 t3 = GET:I64(88)
 t4 = GET:I64(88)
 t6 = 32Uto64(And32(8Uto32(GET:I8(1323)),0x1:I32))
 t5 = Sub64(Sub64(t4,t3),Xor64(t6,0x1:I64))

Where the Sub64(t4,t3) is recognized as being just zero, so we get:

 t3 = GET:I64(88)
 t24 = 32Uto64(And32(8Uto32(And8(32to8(1Uto32(CmpLE64U(t2,0x0:I64))),0x1:I8)),0x1:I32))
 t28 = Sub64(0x0:I64,Xor64(t24,0x1:I64))

And nothing relies on the original (potentially) uninitialized register.

With this both the reproducer as the larger C++ program using unique_ptr don't produce any errors anymore under memcheck.
Comment 4 Mark Wielaard 2019-02-21 14:55:40 UTC
Julian, Carl, could you take a peek at the proposed patch in comment #3.

It would be nicer if we could somehow detect this in the optimizer or memcheck, but I don't think that is easily possible. So I believe this frontend transformation is the best we can do.
Comment 5 Julian Seward 2019-02-21 15:54:38 UTC
(In reply to Mark Wielaard from comment #4)
> Julian, Carl, could you take a peek at the proposed patch in comment #3.

Mark, that looks totally fine to me.  If it doesn't break anything (which
I think we established on irc at the time you made the patch), then I say
land it.
Comment 6 Carl Love 2019-02-21 16:11:18 UTC
It all looks valid to me.  I didn't see any issues with the patch.
Comment 7 Mark Wielaard 2019-02-21 16:31:16 UTC
Thanks for the reviews. Pushed now as:

commit 256cf43c5eadb28edb45436aca6fda8ee55eb10e
Author: Mark Wielaard <mark@klomp.org>
Date:   Thu Feb 21 17:21:53 2019 +0100

    memcheck powerpc subfe x, x, x initializes x to 0 or -1 based on CA
    
    GCC might use subfe x, x, x to initialize x to 0 or -1, based on
    whether the carry flag is set. This happens in some cases when g++
    compiles resetting a unique_ptr. The "trick" used by the compiler is
    that it can AND a pointer with the register x (now 0x0 or 0xffffffff)
    to set something to NULL or to the given pointer.
    
    subfe is implemented as rD = (log not)rA + rB + XER[CA]
    if we instead implement it as rD = rB - rA - (XER[CA] ^ 1)
    then memcheck can see that rB and Ra cancel each other out if they
    are the same.
    
    https://bugs.kde.org/show_bug.cgi?id=404054