Bug 213685 - Undefined value propagates past dependency breaking instruction (pcmpeq)
Summary: Undefined value propagates past dependency breaking instruction (pcmpeq)
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (other bugs)
Version First Reported In: 3.4.1
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-11-08 14:18 UTC by Timothy B. Terriberry
Modified: 2010-10-01 16:07 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments
Test case for pcmpeqb (256 bytes, text/x-csrc)
2009-11-08 14:18 UTC, Timothy B. Terriberry
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Timothy B. Terriberry 2009-11-08 14:18:24 UTC
Created attachment 38186 [details]
Test case for pcmpeqb

I ran across a case where valgrind reported an undefined value that I
was sure was a false positive, but after investigating more closely, it
appears valgrind was simply propagating the validity of an actual
undefined value to the wrong place. Instructions such as xor, sub, pxor, psub, and pcmpeq with identical source and destination arguments are common idioms used to generate simple constants (0 in the first four cases, -1 in the last case), and the output bits are defined, regardless of whether the input bits are.

I've attached a simple test case which exhibits the problem using pcmpeq. It produces multiple "Use of uninitialized value of size 4/Conditional jump or move depends on uninitialized value(s)" warnings inside the printf. A similar example using xor does not exhibit the problem.
Comment 1 Julian Seward 2010-09-25 02:06:17 UTC
would be easy to fix.  The constant folder knows how to fold out various flavours of cmp(x,x) -> 1 (also xors) but Iop_CmpEQ8x8 is not one of them
Comment 2 Timothy B. Terriberry 2010-09-25 02:11:17 UTC
The official list of dependency-breaking instructions from Section 3.5.1.6 of the Intel 64 and IA-32 Architectures Optimization Reference Manual (i.e., the ones with hardware support in the out-of-order execution engine) is:

xor reg,reg
sub reg,reg
xorps/pd xmmreg,xmmreg
pxor xmmreg,xmmreg
subps/pd xmmreg,xmmreg
psubb/w/d/q xmmreg,xmmreg
Comment 3 Christian Borntraeger 2010-09-25 11:42:28 UTC
Interesting. we have a similar issue on s390x for XC (xor on memory field mem1=mem1 xor mem2 with variable length) Thats why I used a Mux that uses a zero constant if both addresses are the same. Would be cool to get rid of the slow Mux.
Comment 4 Julian Seward 2010-09-25 12:37:58 UTC
(In reply to comment #3)
> Interesting. we have a similar issue on s390x for XC (xor on memory field
> mem1=mem1 xor mem2 

Maybe: if the addresses are the same, load the memory value into an IRTemp
(t) and then create Xor64(t,t) (or whatever width is correct).  Then the
constant folder will (or can easily be made to) convert the Xor into a
suitably-typed zero value.
Comment 5 Julian Seward 2010-10-01 01:23:21 UTC
(In reply to comment #2)
> The official list of dependency-breaking instructions [...] is:

Hmm, it's not so simple though.  You want dependency breaking on the
pcmpeqb insn in comment 1, yet it isn't part of the list.  The set of
dependency-breaking operations that we actually need is a superset of
the official list, and it is phrased in terms of VEX IR primops, not
in terms of Intel instructions.

To find potential opportunities to find such IR primop applications,
I put the following patch in the IR constant folder

Index: priv/ir_opt.c
===================================================================
--- priv/ir_opt.c       (revision 2055)
+++ priv/ir_opt.c       (working copy)
@@ -1618,6 +1618,12 @@
       }
    }
 
+   if (e == e2 && e->tag == Iex_Binop 
+       && sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) {
+      vex_printf("IDENT: ");
+      ppIRExpr(e); vex_printf("\n");
+   }
+
    if (DEBUG_IROPT && e2 != e) {
       vex_printf("FOLD: "); 
       ppIRExpr(e); vex_printf("  ->  ");

It finds 'binop(t,t)' patterns that might be foldable.  On a run
of Firefox on x86_64-linux it picked up the following

IDENT: CmpEQ64(t24,t24)
IDENT: CmpEQ8x16(t13,t13)
IDENT: CmpF64(t10,t10)
IDENT: OrV128(t101,t101)
IDENT: Sub64F0x2(t30,t30)
IDENT: Sub64(t13,t13)

and from example 1 we have CmpEQ8x8 too.  I don't think I can be
bothered with the floating point ones (CmpF64, Sub64F0x2) as that
tends to be more complex, but the others might be foldable.  A good
thing to do would be to run a bunch of codec-y code with the above
patch in place, and see if any more cases pop out, so they can all
be fixed at once.

FTR, the constant folder already has rules of the form

   op(t,t) ==> t 
   for op in And64, And32, And8, Or64, Or32, Or16, Or8, Max32U

and

   op(t,t) ==> 0
   for op in Xor64, Xor32, Xor16, Xor8, XorV128, Sub32

which is probably why you were unable to exhibit the problem using
any xor instructions.
Comment 6 Julian Seward 2010-10-01 16:07:14 UTC
The pcmpeqb case for mmx and xmm registers has been fixed in r2058.