Bug 287260

Summary:	Incorrect Conditional jump or move depends on uninitialised value(s)
Product:	[Developer tools] valgrind	Reporter:	M Welinder <mwelinder>
Component:	memcheck	Assignee:	Julian Seward <jseward>
Status:	RESOLVED FIXED
Severity:	normal	CC:	flo2030
Priority:	NOR
Version First Reported In:	3.6.0
Target Milestone:	---
Platform:	openSUSE
OS:	Linux
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:
Attachments:	Sample program rewrite algebraic transformations

Description M Welinder 2011-11-22 14:15:27 UTC

Valgrind is valgrind-3.6.1 from OpenSuSE 11.4

Comment 1 M Welinder 2011-11-22 14:23:31 UTC

Created attachment 65934 [details]
Sample program

Comment 2 M Welinder 2011-11-22 14:23:57 UTC

Valgrind is valgrind-3.6.1 from OpenSuSE 11.4
Compiler is gcc version 4.5.1 20101208 [gcc-4_5-branch revision 167585] (SUSE Linux) 
Target: x86_64-suse-linux


Test program being attached.

welinder@sequoia:~> valgrind ./a.out 45
==23829== Memcheck, a memory error detector
==23829== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==23829== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==23829== Command: ./a.out 45
==23829== 
==23829== Conditional jump or move depends on uninitialised value(s)
==23829==    at 0x4E703C0: vfprintf (in /lib64/libc-2.11.3.so)
==23829==    by 0x4E74A3F: buffered_vfprintf (in /lib64/libc-2.11.3.so)
==23829==    by 0x4E6F85D: vfprintf (in /lib64/libc-2.11.3.so)
==23829==    by 0x4E7A287: fprintf (in /lib64/libc-2.11.3.so)
==23829==    by 0x4005B4: main (vvv.c:14)
[...]

The generated assembler code for line 14 is...

        .loc 1 14 0
        movq    stderr@GOTPCREL(%rip), %rax
        movq    (%rsp), %rdx
        movq    (%rax), %rdi
.LVL2:
        xorl    %eax, %eax
        addl    %edx, %edx
        sarl    $22, %edx
        call    fprintf@PLT

Wild guess as to cause: "addl %edx, %edx" doesn't track bit-level
initialization information correctly.  It is used here in place of
a one-bit shift.

Comment 3 Florian Krohm 2012-01-22 22:26:55 UTC

(In reply to comment #2)
> 
> Wild guess as to cause: "addl %edx, %edx" doesn't track bit-level
> initialization information correctly.  It is used here in place of
> a one-bit shift.

It is correct that valgrind does not track defined-ness at the bit
level through addition. Any undefined input bit will cause all output
bits to be undefined. That is the default behaviour. It's a 
performance / precision trade-off which works well in practice. Whether 
the imprecision is the cause of what you observe, I do not know. You 
can apply the patch below and see whether it helps. It will use a 
more expensive (precise) propagation scheme for some (but not all) 
operations.

Index: memcheck/mc_translate.c
===================================================================
--- memcheck/mc_translate.c	(revision 12348)
+++ memcheck/mc_translate.c	(working copy)
@@ -4944,7 +4944,7 @@
 
    }
 
-   mce.bogusLiterals = bogus;
+   mce.bogusLiterals = True;
 
    /* Copy verbatim any IR preamble preceding the first IMark */

Comment 4 Julian Seward 2012-01-23 10:17:21 UTC

Florian: what's wierd about this is, it's a known problem, and 
there are a bunch of folding rules in ir_opt.c (fold_Expr) that
do Add32(t,t) ==> t << 1 etc.  Yet it does not appear to have had
any effect here:

        0x80484BB:  movl 28(%esp,,),%eax

              ------ IMark(0x80484BB, 4, 0) ------
              PUT(60) = 0x80484BB:I32
              t18 = Add32(GET:I32(16),0x1C:I32)
              PUT(0) = LDle:I32(t18)

        0x80484BF:  addl %eax,%eax

              ------ IMark(0x80484BF, 2, 0) ------
              PUT(60) = 0x80484BF:I32
              t21 = GET:I32(0)
              t20 = GET:I32(0)
              t19 = Add32(t21,t20)
              PUT(32) = 0x3:I32
              PUT(36) = t21
              PUT(40) = t20
              PUT(44) = 0x0:I32
              PUT(0) = t19

becomes

   ------ IMark(0x80484BB, 4, 0) ------
   PUT(60) = 0x80484BB:I32
   t47 = LDle:I32(Add32(t5,0x1C:I32))

   ------ IMark(0x80484BF, 2, 0) ------
   t19 = Add32(t47,t47)

That should have been transformed into "t19 = Shl32(t47, 1)".  I have
no idea offhand why that didn't happen.  FTR, the relevant folding
rule is at ir_opt.c:1628.

Maybe it's a phase ordering problem, in the sense that the relevant
preconditions to make the folding rule work only appear too late in
the IR optimisation process.  One way to check that is to ask for
an extra, final run of the relevant folding pass (cprop_BB) by adding
a final call to it in cheap_transformations().

I suspect that won't help though.  It would be good to get to the
bottom of this.  It makes me nervous when the IR optimiser doesn't
work like how I expect.

Comment 5 Florian Krohm 2012-01-24 04:17:13 UTC

(In reply to comment #4)

> Maybe it's a phase ordering problem,

No it's not. The control flow in fold_Expr has issues :)
Cut'n pasted form ir_opt.c:

-->     if (e->Iex.Binop.op == Iop_Add32
             || e->Iex.Binop.op == Iop_Or32 || e->Iex.Binop.op == Iop_Max32U) {
            if (isZeroU32(e->Iex.Binop.arg2))
               e2 = e->Iex.Binop.arg1;
            else if (isZeroU32(e->Iex.Binop.arg1))
               e2 = e->Iex.Binop.arg2;
         } else

         /* Sub64(x,0) ==> x */
-->      if (e->Iex.Binop.op == Iop_Sub64 && isZeroU64(e->Iex.Binop.arg2)) {
            e2 = e->Iex.Binop.arg1;
         } else

         /* Add32(t,t) ==> t << 1.  Memcheck doesn't understand that
            x+x produces a defined least significant bit, and it seems
            simplest just to get rid of the problem by rewriting it
            out, since the opportunity to do so exists. */
-->      if (e->Iex.Binop.op == Iop_Add32

The three conditions (-->) appear chained, but they are not. It's 
the indentation that is fooling us. In fact they are nested

  if (e->Iex.Binop.op == Iop_Add32 ....) {
     ...
  } else 
      if (e->Iex.Binop.op == Iop_Sub64 && isZeroU64(e->Iex.Binop.arg2)) {
         ...
      } else 
          if (e->Iex.Binop.op == Iop_Add32 ...

Since the 1st condition is true we never do the optimization And32(t,t) -> t

Stared at it for a while before I saw it. I guess you only see what you 
want to see. We should rewrite the code and use a switch stmt instead.

Comment 6 Julian Seward 2012-01-24 10:17:59 UTC

> Stared at it for a while before I saw it. I guess you only see what you 
> want to see. We should rewrite the code and use a switch stmt instead.

Oh, well spotted.  Yep, I never noticed this before :-(  Basically
a stupidity in the pattern matching "logic".

If we don't want to rewrite this with a switch (which I suspect is
a bit more complex than it sounds) then another alternative is to
transform the current scheme


   if (pattern1 applies) {
      if (further checks on pattern1)
         e2 = replacement expression;
   } else
      if (pattern2 applies) {
         e2 = replacement expression;
      }

with

   if (pattern1 applies) {
      if (further checks on patter1) {
         e2 = replacement expression; goto out;
      }
   }
   if (pattern2 applies) {
      e2 = replacement expression; goto out;
   }

where out: is a label near the end of the procedure.  Why this has
failed is that we have a "pattern1 applies == True" but "further
checks pattern1 == False" (e->Iex.Binop.op == Iop_Add32, but neither
of the args are zero), which stops it trying later to match pattern2
(Add32(x,x)).  The version with "goto out;" and no else-chaining would
avoid this problem.

Well, not bad.  This code has been in production use for .. 7 ish
years before anybody noticed the problem :-)

Comment 7 Florian Krohm 2012-01-24 14:29:54 UTC

(In reply to comment #6)

> 
> If we don't want to rewrite this with a switch (which I suspect is
> a bit more complex than it sounds) then another alternative is to
> transform the current scheme
> 

[snip]

>  The version with "goto out;" and no else-chaining would
> avoid this problem.
> 

Yes, it would. And transforming the status quo into what you propose
would be more mechanical than the conversion to a switch stmt. 
Using a switch has two advantages, though:
(1) All rules related to a certain IROp are grouped together in a 
    single spot. That is good for extensibility. You only have to
    look in one place to see whether you already covered a certain
    pattern. It's also easier to proofread and understand.
(2) Avoid repeated checking of the same condition. If we transformed
    the snippet from comment #5 we'd be checking for Add32 twice (if
    the first pattern does not evaluate to true). Since the IRops we
    deal with here are encoded with contiguous integers the compiler
    can use a branch table to optimize the switch. Should help squeeze
    a cycle here and there.

I could cook up a patch using a switch, if you agree we should take
that route.

Comment 8 Julian Seward 2012-01-24 18:28:58 UTC

(In reply to comment #7)
> I could cook up a patch using a switch, if you agree we should take
> that route.

Well, if you have enthusiasm, yes definitely.  Obviously the switch
thing is the right way to go -- my proposal in comment #6 is at best a
half-hearted kludge.

(random comment)
One way to sanity check that at least as much folding is going on
afterwards compared to before is to run --tool=none --stats=yes and
look at these lines

--13925--  transtab: new        3,165 (72,315 -> 470,579; ratio 65:10) [0 scs]

and check that the number after the arrow (the size of generated code)
is the same or smaller than when you started.

Comment 9 Florian Krohm 2012-01-28 19:23:44 UTC

(In reply to comment #8)
> (In reply to comment #7)
>
> One way to sanity check that at least as much folding is going on
> afterwards compared to before is to run --tool=none --stats=yes and
> look at these lines
> 
> --13925--  transtab: new        3,165 (72,315 -> 470,579; ratio 65:10) [0 scs]
> 
> and check that the number after the arrow (the size of generated code)
> is the same or smaller than when you started.

Thanks for the tip. Below is a patch. Here are the transtab results. The
number right from ==> is the new number and as expected is always smaller.

none:
-----
bz2:        3,883 (110,915 -> 543,146 ==> 543,127);
bigcode1:  29,942 (376,394 -> 2,472,272 ==> 2,470,386;
bigcode2: 114,315 (1,392,631 -> 9,267,107 ==> 9,259,596;
fbench:     2,309 (50,316 -> 294,823 ==> 294,810;
ffbench:    1,987 (41,529 -> 233,856 ==> 233,831;
heap:       1,908 (39,245 -> 216,857 ==> 216,850;
sarp:       1,536 (32,130 -> 176,665 ==> 176,658;
tinycc:     5,168 (121,998 -> 669,416 ==> 669,393;

memcheck:
---------
bz2:        3,879 (110,539 -> 1,471,728 ==> 1,471,618;
bigcode1:  30,104 (379,444 -> 5,755,550 ==> 5,751,738;
bigcode2: 114,477 (1,395,681 -> 21,212,777 ==> 21,197,715;
fbench:     2,402 (52,278 -> 795,091 ==> 794,997;
ffbench:    2,048 (42,412 -> 637,931 ==> 637,827;
heap:       1,902 (38,812 -> 581,547 ==> 581,485;
sarp:       1,754 (36,172 -> 540,144 ==> 540,082;
tinycc:     4,766 (110,618 -> 1,609,684 ==> 1,609,502;

I could observe the OPs problem on my machine as well and with this patch memcheck no longer reports anything.

The patch introduces a new function isZeroU which allows the grouping of
like IROps. That makes sense because we're dealing with algebraic transformations here where the width of operands is irrelevant. 
The patch is careful to not introduce any new patterns or apply patterns to new IROPs. That is something for a followup patch.
I've tested the patch on x96-linux with no new regressions.

Comment 10 Florian Krohm 2012-01-28 19:24:37 UTC

Created attachment 68282 [details]
rewrite algebraic transformations

patch

Comment 11 M Welinder 2012-01-28 21:04:11 UTC

Thanks for looking into this.

Just out of curiosity, why does And32 get checks for and(x,0) and
and(x,-1) when the other sizes do not?

And why is there a or(x,0), but no xor(x,0) check?

Ditto shl(0,x) vs shr(0,x).

Comment 12 Julian Seward 2012-01-28 22:15:41 UTC

(In reply to comment #10)
> Created an attachment (id=68282) [details]
> rewrite algebraic transformations

Excellent!  Thanks for doing that.  Please commit.

Comment 13 Julian Seward 2012-01-28 22:18:01 UTC

(In reply to comment #11)
> Just out of curiosity, why does And32 get checks for and(x,0) and
> and(x,-1) when the other sizes do not?
> 
> And why is there a or(x,0), but no xor(x,0) check?
> 
> Ditto shl(0,x) vs shr(0,x).

Because the addition of the folding rules was done on an ad-hoc basis,
by profiling a run (--profile-flags=10001000), looking for missing foldings
in the output, and adding rules.  That has the advantage (from a safety
point of view) of only adding rules when (in effect) a test case for them
is known to exist.  I'm sure there are missing folding opportunities, but
also I'd bet they are pretty rare.

Comment 14 Florian Krohm 2012-01-29 02:28:17 UTC

(In reply to comment #12)
> (In reply to comment #10)
> > Created an attachment (id=68282) [details] [details]
> > rewrite algebraic transformations
> 
> Excellent!  Thanks for doing that.  Please commit.

Done in VEX r2244 and valgrind r12359.

Comment 15 Florian Krohm 2012-02-13 00:12:15 UTC

For some systems  VEX r2244 and valgrind r12359 was not good enough and the memcheck messages were still issued. This is now fixed for good in VEX r2246..
That's what we think at least....

See also the thread "valgrind: r12359 - in trunk: .	memcheck/tests"
on valgrind-developers for more info.