Bug 352364

Summary:	ppc64: --expensive-definedness-checks=yes is not quite working here
Product:	[Developer tools] valgrind	Reporter:	Florian Krohm <flo2030>
Component:	memcheck	Assignee:	Julian Seward <jseward>
Status:	CONFIRMED ---
Severity:	normal	CC:	cel, mark, will_schmidt
Priority:	NOR
Version:	unspecified
Target Milestone:	---
Platform:	Other
OS:	Linux
See Also:	https://bugs.kde.org/show_bug.cgi?id=340392
Latest Commit:		Version Fixed In:
Sentry Crash Report:

Description Florian Krohm 2015-09-06 21:14:56 UTC

memcheck/tests/bug340392.c currently fails on ppc64  (r15629), i.e. it issues a complaint where it should not.

A reduced testcase is:

typedef struct {
  unsigned char c;
  int i;
  void *foo;
} S;

int
main (int argc, char **argv)
{
  S x;
  S *s = &x;
  s->c = 1;
  if (s->c == 0 && s->i == 205 && s->foo == s) 
    return 1;
  return 0;
}

gcc -O2 test.c -o foo
./vg-in-place -q --expensive-definedness-checks=yes ./foo
==61940== Conditional jump or move depends on uninitialised value(s)
==61940==    at 0x100003C0: main (in /home/florian/trunk/foo)

This is with gcc 4.7.2

The disassembly of main:
00000000100003a0 <.main>:
    100003a0:	39 20 00 01 	li      r9,1
    100003a4:	38 60 00 00 	li      r3,0
    100003a8:	99 21 ff f0 	stb     r9,-16(r1)
    100003ac:	60 42 00 00 	ori     r2,r2,0
    100003b0:	e9 21 ff f0 	ld      r9,-16(r1)
    100003b4:	79 29 46 00 	rldicl  r9,r9,8,24
    100003b8:	79 29 c0 02 	rotldi  r9,r9,56        == rldicl r9,r9,56,0
    100003bc:	2f a9 00 cd 	cmpdi   cr7,r9,205	==  cmpi cr7,1,r9,205    ERROR here
    100003c0:	4c 9e 00 20 	bnelr   cr7
    100003c4:	e9 21 ff f8 	ld      r9,-8(r1)
    100003c8:	38 61 ff f0 	addi    r3,r1,-16
    100003cc:	7c 63 4a 78 	xor     r3,r3,r9
    100003d0:	7c 63 00 74 	cntlzd  r3,r3
    100003d4:	78 63 d1 82 	rldicl  r3,r3,58,6
    100003d8:	4e 80 00 20 	blr

and the trace:

==== SB 1237 (evchecks 7458) [tid 1] 0x100003a0 main /home/florian/trunk/foo+0x100003a0

------------------------ Front end ------------------------

	0x100003A0:  li r9,1

              ------ IMark(0x100003A0, 4, 0) ------
              t0 = GET:I64(16)
              t1 = GET:I64(16)
              t2 = 0x1:I64
              PUT(88) = t2
              PUT(1296) = 0x100003A4:I64

	0x100003A4:  li r3,0

              ------ IMark(0x100003A4, 4, 0) ------
              t3 = GET:I64(16)
              t4 = GET:I64(16)
              t5 = 0x0:I64
              PUT(40) = t5
              PUT(1296) = 0x100003A8:I64

	0x100003A8:  stb r9,-16(r1)

              ------ IMark(0x100003A8, 4, 0) ------
              t7 = GET:I64(264)
              t6 = GET:I64(88)
              t8 = Add64(GET:I64(24),0xFFFFFFFFFFFFFFF0:I64)
              STbe(t8) = 64to8(t6)
              PUT(1296) = 0x100003AC:I64

	0x100003AC:  ori r2,r2,0x0

              ------ IMark(0x100003AC, 4, 0) ------
              t9 = GET:I64(32)
              t11 = GET:I64(16)
              t10 = Or64(t9,0x0:I64)
              PUT(32) = t10
              PUT(1296) = 0x100003B0:I64

	0x100003B0:  ld r9,-16(r1)

              ------ IMark(0x100003B0, 4, 0) ------
              t12 = Add64(GET:I64(24),0xFFFFFFFFFFFFFFF0:I64)
              PUT(88) = LDbe:I64(t12)
              PUT(1296) = 0x100003B4:I64

	0x100003B4:  rldicl r9,r9,8,24

              ------ IMark(0x100003B4, 4, 0) ------
              t13 = GET:I64(88)
              t15 = GET:I64(80)
              t14 = And64(ITE(CmpNE8(And8(0x8:I8,0x3F:I8),0x0:I8),Or64(Shl64(t13,And8(0x8:I8,0x3F:I8)),Shr64(t13,Sub8(0x40:I8,And8(0x8:I8,0x3F:I8)))),t13),0xFFFFFFFFFF:I64)
              PUT(88) = t14
              PUT(1296) = 0x100003B8:I64

	0x100003B8:  rldicl r9,r9,56,0

              ------ IMark(0x100003B8, 4, 0) ------
              t17 = GET:I64(88)
              t19 = GET:I64(208)
              t18 = And64(ITE(CmpNE8(And8(0x38:I8,0x3F:I8),0x0:I8),Or64(Shl64(t17,And8(0x38:I8,0x3F:I8)),Shr64(t17,Sub8(0x40:I8,And8(0x38:I8,0x3F:I8)))),t17),0xFFFFFFFFFFFFFFFF:I64)
              PUT(88) = t18
              PUT(1296) = 0x100003BC:I64

	0x100003BC:  cmpi cr7,1,r9,205

              ------ IMark(0x100003BC, 4, 0) ------
              PUT(1338) = 64to8(CmpORD64S(GET:I64(88),0xCD:I64))
              PUT(1339) = GET:I8(1320)
              PUT(1296) = 0x100003C0:I64

	0x100003C0:  bclr 0x4, 0x1E

              ------ IMark(0x100003C0, 4, 0) ------
              t25 = 0xFFFFFFFF:I32
              t22 = t25
              t27 = And32(8Uto32(GET:I8(1338)),0x2:I32)
              t26 = Xor32(t27,0x2:I32)
              t23 = t26
              t21 = And32(t23,t22)
              t24 = And64(GET:I64(1304),0xFFFFFFFFFFFFFFFC:I64)
              if (CmpEQ32(t21,0x0:I32)) { PUT(1296) = 0x100003C4:I64; exit-Boring } 
              PUT(1296) = t24
              PUT(1296) = GET:I64(1296); exit-Return

GuestBytes 100003A0 36  39 20 00 01 38 60 00 00 99 21 FF F0 60 42 00 00 E9 21 FF F0 79 29 46 00 79 29 C0 02 2F A9 00 CD 4C 9E 00 20  647EDF28


------------------------  After tree-building ------------------------

IRSB {
   t0:I64   t1:I64   t2:I64   t3:I64   t4:I64   t5:I64   t6:I64   t7:I64
   t8:I64   t9:I64   t10:I64   t11:I64   t12:I64   t13:I64   t14:I64   t15:I64
   t16:I64   t17:I64   t18:I64   t19:I64   t20:I64   t21:I32   t22:I32   t23:I32
   t24:I64   t25:I32   t26:I32   t27:I32   t28:I64   t29:I64   t30:I8   t31:I64
   t32:I64   t33:I64   t34:I64   t35:I64   t36:I1   t37:I8   t38:I64   t39:I64
   t40:I8   t41:I64   t42:I8   t43:I8   t44:I64   t45:I64   t46:I1   t47:I8
   t48:I64   t49:I64   t50:I8   t51:I64   t52:I8   t53:I8   t54:I8   t55:I64
   t56:I64   t57:I8   t58:I32   t59:I32   t60:I8   t61:I64   t62:I64   t63:I1
   t64:I64   t65:I64   t66:I64   t67:I64   t68:I64   t69:I64   t70:I64   t71:I64
   t72:I64   t73:I64   t74:I64   t75:I64   t76:I64   t77:I64   t78:I1   t79:I64
   t80:I32   t81:I64   t82:I64   t83:I64   t84:I64   t85:I64   t86:I64   t87:I64
   t88:I64   t89:I64   t90:I64   t91:I64   t92:I64   t93:I64   t94:I64   t95:I64
   t96:I1   t97:I64   t98:I64   t99:I64   t100:I64   t101:I1   t102:I64   t103:I64
   t104:I64   t105:I64   t106:I64   t107:I1   t108:I64   t109:I64   t110:I64   t111:I64
   t112:I64   t113:I64   t114:I64   t115:I64   t116:I64   t117:I64   t118:I64   t119:I64
   t120:I64   t121:I64   t122:I64   t123:I64   t124:I64   t125:I64   t126:I64   t127:I64
   t128:I64   t129:I1   t130:I64   t131:I64   t132:I64   t133:I64   t134:I64   t135:I1
   t136:I64   t137:I64   t138:I64   t139:I64   t140:I64   t141:I64   t142:I64   t143:I64
   t144:I64   t145:I64   t146:I64   t147:I64   t148:I64   t149:I64   t150:I64   t151:I8
   t152:I8   t153:I8   t154:I32   t155:I32   t156:I32   t157:I32   t158:I32   t159:I32
   t160:I32   t161:I32   t162:I32   t163:I32   t164:I32   t165:I64   t166:I64   t167:I64
   t168:I64   t169:I64   t170:I64   t171:I64   t172:I64   t173:I1   t174:I32   t175:I32
   t176:I32   t177:I32   t178:I32   t179:I32   t180:I1   t181:I32   t182:I32   t183:I1
   t184:I1   t185:I1   t186:I64   

   ------ IMark(0x100003A0, 4, 0) ------
   ------ IMark(0x100003A4, 4, 0) ------
   PUT(1752) = 0x0:I64
   PUT(40) = 0x0:I64
   PUT(1296) = 0x100003A8:I64
   ------ IMark(0x100003A8, 4, 0) ------
   t65 = GET:I64(1736)
   t29 = GET:I64(24)
   t28 = Add64(t29,0xFFFFFFFFFFFFFFF0:I64)
   DIRTY CmpNEZ64(Or64(t65,Xor64(Add64(And64(t29,Not64(t65)),0xFFFFFFFFFFFFFFF0:I64),Add64(Or64(t29,t65),0xFFFFFFFFFFFFFFF0:I64)))) RdFX-gst(24,8) RdFX-gst(1296,8) ::: MC_(helperc_value_check8_fail_no_o){0x380353c0}()
   DIRTY 1:I1 RdFX-gst(24,8) RdFX-gst(1296,8) ::: MC_(helperc_STOREV8)[rp=2]{0x38035050}(t28,0x0:I64)
   STbe(t28) = 0x1:I8
   ------ IMark(0x100003AC, 4, 0) ------
   PUT(1744) = GET:I64(1744)
   PUT(32) = GET:I64(32)
   PUT(1296) = 0x100003B0:I64
   ------ IMark(0x100003B0, 4, 0) ------
   t31 = Add64(t29,0xFFFFFFFFFFFFFFF0:I64)
   IR-NoOp
   t98 = DIRTY 1:I1 RdFX-gst(24,8) RdFX-gst(1296,8) ::: MC_(helperc_LOADV64be)[rp=1]{0x380344c0}(t31)
   t33 = LDbe:I64(t31)
   ------ IMark(0x100003B4, 4, 0) ------
   t100 = Shl64(t98,0x8:I8)
   t39 = Shl64(t33,0x8:I8)
   t106 = Shr64(t98,0x38:I8)
   t41 = Shr64(t33,0x38:I8)
   t118 = And64(Or64(t100,t106),And64(Or64(Not64(t39),t100),Or64(Not64(t41),t106)))
   t38 = Or64(t39,t41)
   t125 = And64(t118,And64(Or64(t38,t118),0xFFFFFFFFFF:I64))
   t34 = And64(t38,0xFFFFFFFFFF:I64)
   ------ IMark(0x100003B8, 4, 0) ------
   t128 = Shl64(t125,0x38:I8)
   t49 = Shl64(t34,0x38:I8)
   t134 = Shr64(t125,0x8:I8)
   t51 = Shr64(t34,0x8:I8)
   t146 = And64(Or64(t128,t134),And64(Or64(Not64(t49),t128),Or64(Not64(t51),t134)))
   t48 = Or64(t49,t51)
   PUT(1800) = t146
   PUT(88) = t48
   ------ IMark(0x100003BC, 4, 0) ------
   t152 = 64to8(And64(CmpwNEZ64(t146),0xE:I64))
   t54 = 64to8(CmpORD64S(t48,0xCD:I64))
   PUT(3050) = t152
   PUT(1338) = t54
   PUT(3051) = GET:I8(3032)
   PUT(1339) = GET:I8(1320)
   PUT(1296) = 0x100003C0:I64
   ------ IMark(0x100003C0, 4, 0) ------
   t155 = 8Uto32(t152)
   t59 = 8Uto32(t54)
   t161 = And32(t155,And32(Or32(t59,t155),0x2:I32))
   t26 = Xor32(And32(t59,0x2:I32),0x2:I32)
   t165 = GET:I64(3016)
   t62 = GET:I64(1304)
   t171 = And64(t165,And64(Or64(t62,t165),0xFFFFFFFFFFFFFFFC:I64))
   DIRTY CmpNEZ32(And32(CmpwNEZ32(t161),1Sto32(CmpEQ32(Or32(t161,Not32(t26)),0xFFFFFFFF:I32)))) RdFX-gst(24,8) RdFX-gst(1296,8) ::: MC_(helperc_value_check0_fail_no_o){0x38035300}()
   if (CmpEQ32(t26,0x0:I32)) { PUT(1296) = 0x100003C4:I64; exit-Boring } 
   DIRTY CmpNEZ64(t171) RdFX-gst(24,8) RdFX-gst(1296,8) ::: MC_(helperc_value_check8_fail_no_o){0x380353c0}()
   PUT(1296) = And64(t62,0xFFFFFFFFFFFFFFFC:I64); exit-Return
}

VexExpansionRatio 36 664   184 :10

==24332== Conditional jump or move depends on uninitialised value(s)
==24332==    at 0x100003C0: main (in /home/florian/trunk/foo)
==24332== 

I'd bet we need a more precise version of CmpORD64S.

Comment 1 Julian Seward 2015-09-21 08:37:10 UTC

(In reply to Florian Krohm from comment #0)
> I'd bet we need a more precise version of CmpORD64S.

I agree.  There is in fact special-case handling already for
CmpORD{32,64}{S,U} for the case where the second argument is a
constant zero, but in this case it is a constant 0xCD, so it doesn't
apply.

Comment 2 Julian Seward 2016-08-16 06:41:31 UTC

Some more analysis:

typedef
   struct { unsigned char c; int i; void *foo;
} S;

int main (int argc, char **argv) {
   S x;
   S *s = &x;
   s->c = 1;
   if (s->c == 0 && s->i == 205 && s->foo == s) return 1;
   return 0;
}

/*
00000000100003a0 <.main>:
    100003a0:   39 20 00 01     li      r9,1
    100003a4:   38 60 00 00     li      r3,0
    100003a8:   99 21 ff f0     stb     r9,-16(r1)
    100003ac:   60 42 00 00     ori     r2,r2,0
    100003b0:   e9 21 ff f0     ld      r9,-16(r1)
    100003b4:   79 29 46 00     rldicl  r9,r9,8,24
    100003b8:   79 29 c0 02     rotldi  r9,r9,56
    100003bc:   2b a9 00 cd     cmpldi  cr7,r9,205
    100003c0:   4c 9e 00 20     bnelr   cr7   <-------here
    100003c4:   e9 21 ff f8     ld      r9,-8(r1)
    100003c8:   38 61 ff f0     addi    r3,r1,-16
    100003cc:   7c 63 4a 78     xor     r3,r3,r9
    100003d0:   7c 63 00 74     cntlzd  r3,r3
    100003d4:   78 63 d1 82     rldicl  r3,r3,58,6
    100003d8:   7c 63 07 b4     extsw   r3,r3
    100003dc:   4e 80 00 20     blr

   ------ IMark(0x100003B0, 4, 0) ------
   t33 = LDbe:I64(Add64(t29,0xFFFFFFFFFFFFFFF0:I64))
   ------ IMark(0x100003B4, 4, 0) ------
   t34 = And64(Or64(Shl64(t33,0x8:I8),Shr64(t33,0x38:I8)),0xFFFFFFFFFF:I64)
   ------ IMark(0x100003B8, 4, 0) ------
   t48 = Or64(Shl64(t34,0x38:I8),Shr64(t34,0x8:I8))
   PUT(88) = t48
   ------ IMark(0x100003BC, 4, 0) ------
   t54 = 64to8(CmpORD64U(t48,0xCD:I64))

   Simplified:
   t33 = loaded value
   t34 = Rol64(t33, 8) & 0xFF_FFFF_FFFF
   t48 = Ror64(t34, 8)
   t54 = 64to8(CmpORD64U(t48,0xCD:I64))
*/

The t33 -> t48 transformation, is (MSB on the left):

 t33                                 = A B C D E F G H
 Rol(t33, 8)                         = B C D E F G H A
 t34 = Rol(t33, 8) & 0xFF_FFFF_FFFF  = 0 0 0 E F G H A
 t48 = Ror(t34, 8)                   = A 0 0 0 E F G H

So the two rotates in opposite directions serve only to temporarily
put the to-be-zeroed-out area at the top of the word, since rldicl can
only make a bit field immediate that is "anchored" at one end of the
word or the other.

Clearly A is 'c' in the struct and 'E F G H' must be 'i'. 

After 100003a8, memory at -16(r1) is

    -16 -15 -14 -13 -12 -11 -10 -9
      1   U   U   U   U   U   U  U

when loaded (bigendian) we get a 64-bit word, with MSB on the left here:

   1 U U U U U U U

after masking per the above it is

   1 0 0 0 U U U U

and are comparing it with 0 0 0 0 0 0 0 205

This seems to me to be a case that would be correctly handled by
expensiveCmpEQorNE() (mc_translate.c:983, see comment preceding it).
But that applies only to Cmp{EQ,NE}{32,64} and the ppc front end
doesn't produce those.  Instead it produces CmpORD{U,S}{32,64}.

In effect the amd64/x86 (and other, to some extent) front ends together
with ir_opt.c succeed in recovering, from the original instruction stream,
the actual condition (==, !=, >, >=, <=, <, signed or unsigned) that gcc/clang
compiled.  In this case

    100003bc:   2b a9 00 cd     cmpldi  cr7,r9,205
    100003c0:   4c 9e 00 20     bnelr   cr7   <-------here

we know that really this is a != comparison, and if the front end could
have recovered that, creating a CmpNE64, then it would have been instrumented
by expensiveCmpEQorNE and we wouldn't have had the false positive.

I guess we could redo doCmpORD so that it performs the "expensive" interpretation
for EQ# rather than the "standard interp", per the comments in it.  But that
would slow down all comparisons, including the 99.99% that don't need this
level of accuracy.

A different approach would be to redo how the ppc condition codes are translated
into IR, and in particular to attempt to recover the actual compared condition,
like with the amd64/x86/arm/arm64 front ends.  And get rid of CmpORD{32,64}{U,S}.
That would in a way be better, since it would mean that ppc code benefitted
from the same improvements in definedness tracking in comparisons that all the
other architectures do.  This isn't straightforward, though -- might be a week
of work.

Comment 3 Julian Seward 2016-08-16 06:55:24 UTC

In this example, "cmpld ; bne" translates to essentially this

   // the cmpld
   t54 = 64to8(CmpORD64U(t48,0xCD:I64))

   // the bne
   if (CmpEQ32(Xor32(And32(8Uto32(t54),0x2:I32),0x2:I32),0x0:I32))
      { PUT(1296) = 0x100003C4:I64; exit-Boring }

CmpORD64U produces a value that is either 8, 4 or 2, with 8 meaning
"<", 4 meaning ">" and 2 meaning "=".  The bne translation then
inspects t54 bit 1 (intel encoding) (hence the use of 0x2).  This
completely obscures the fact that what we really want is simply

   if (CmpNE64(t48,0xCD:I64) { PUT(1296) = 0x100003C4:I64; exit-Boring }

The tricky part is to change the generated IR so that it directly
exposes the condition on which the branch is based, yet make it
possible for later instructions to recover a correct value for CR0
.. CR7 should they wish to.  Maybe we should change the ppc front end
to use a thunk-based representation as most of the other front ends
do, since that does more or less facilitate all of the above.

Comment 4 Julian Seward 2016-08-16 08:27:01 UTC

On contemplation, it is probably easiest to fix this in ir_opt.c
by doing an IR-to-IR transformation pass that, following the example
in comment 3, transforms

   CmpEQ32( select-bit-1-of( CmpORD64U(x,y) ), 0x0 )

to

   CmpNE64(x,y)

where "select-bit-1-of" incorporates all the size-changing and masking
gunk (64to8, then 8Uto32, then And32 with 2, then Xor32 with 2).  We
could be cleverer about the size-changing and masking bits, for
example declaring CmpORD64U to produce a I32 typed result, since
matching IR patterns is hard work and it's easy to miss cases.  With
that change the example simplifies to

  t54 = CmpORD64U(t48,0xCD:I64)
  if (CmpEQ32(Xor32(And32(t54,0x2:I32),0x2:I32),0x0:I32)) ...

and if we get rid of the Xor32 and instead compare the extracted
value with 2:

  t54 = CmpORD64U(t48,0xCD:I64)
  if (CmpEQ32( And32(t54,0x2:I32), 0x2:I32)) ...

That would make the matching problem easier.  Furthermore if we
normalise And32 and CmpEQ32 nodes so that if just one of the args is a
constant than it is on the right, then there are no L-vs-R structural
variants to consider.

There is a bunch of infrastructure in ir_opt.c, starting at line 5953,
for "MSVC specific transformation hacks", that would form a useful
starting point.

Comment 5 Julian Seward 2017-05-22 07:44:12 UTC

We are going to have to defer this till after 3.13.  It requires
some non-trivial reworking of either IR optimisation specific to
POWER, or to the POWER condition code handling, or both.

Comment 6 Mark Wielaard 2023-10-30 20:49:25 UTC

Added a prereq to skip memcheck/tests/bug340392.vgtest on power:

commit 94960fae328bc82fb1ea51ffb9273ad5f25936d2 (HEAD -> master)
Author: Mark Wielaard <mark@klomp.org>
Date:   Mon Oct 30 21:41:57 2023 +0100

    bug340392 doesn't work on power because of bug352364
    
    Add a prereq to bug340392.vgtest to skip on Power for now.