| Summary: | Undefined value propagates past dependency breaking instruction (pcmpeq) | ||
|---|---|---|---|
| Product: | [Developer tools] valgrind | Reporter: | Timothy B. Terriberry <vmst8um02> |
| Component: | memcheck | Assignee: | Julian Seward <jseward> |
| Status: | REPORTED --- | ||
| Severity: | normal | CC: | borntraeger |
| Priority: | NOR | ||
| Version First Reported In: | 3.4.1 | ||
| Target Milestone: | --- | ||
| Platform: | Compiled Sources | ||
| OS: | Linux | ||
| Latest Commit: | Version Fixed/Implemented In: | ||
| Sentry Crash Report: | |||
| Attachments: | Test case for pcmpeqb | ||
would be easy to fix. The constant folder knows how to fold out various flavours of cmp(x,x) -> 1 (also xors) but Iop_CmpEQ8x8 is not one of them The official list of dependency-breaking instructions from Section 3.5.1.6 of the Intel 64 and IA-32 Architectures Optimization Reference Manual (i.e., the ones with hardware support in the out-of-order execution engine) is: xor reg,reg sub reg,reg xorps/pd xmmreg,xmmreg pxor xmmreg,xmmreg subps/pd xmmreg,xmmreg psubb/w/d/q xmmreg,xmmreg Interesting. we have a similar issue on s390x for XC (xor on memory field mem1=mem1 xor mem2 with variable length) Thats why I used a Mux that uses a zero constant if both addresses are the same. Would be cool to get rid of the slow Mux. (In reply to comment #3) > Interesting. we have a similar issue on s390x for XC (xor on memory field > mem1=mem1 xor mem2 Maybe: if the addresses are the same, load the memory value into an IRTemp (t) and then create Xor64(t,t) (or whatever width is correct). Then the constant folder will (or can easily be made to) convert the Xor into a suitably-typed zero value. (In reply to comment #2) > The official list of dependency-breaking instructions [...] is: Hmm, it's not so simple though. You want dependency breaking on the pcmpeqb insn in comment 1, yet it isn't part of the list. The set of dependency-breaking operations that we actually need is a superset of the official list, and it is phrased in terms of VEX IR primops, not in terms of Intel instructions. To find potential opportunities to find such IR primop applications, I put the following patch in the IR constant folder Index: priv/ir_opt.c =================================================================== --- priv/ir_opt.c (revision 2055) +++ priv/ir_opt.c (working copy) @@ -1618,6 +1618,12 @@ } } + if (e == e2 && e->tag == Iex_Binop + && sameIRTemps(e->Iex.Binop.arg1, e->Iex.Binop.arg2)) { + vex_printf("IDENT: "); + ppIRExpr(e); vex_printf("\n"); + } + if (DEBUG_IROPT && e2 != e) { vex_printf("FOLD: "); ppIRExpr(e); vex_printf(" -> "); It finds 'binop(t,t)' patterns that might be foldable. On a run of Firefox on x86_64-linux it picked up the following IDENT: CmpEQ64(t24,t24) IDENT: CmpEQ8x16(t13,t13) IDENT: CmpF64(t10,t10) IDENT: OrV128(t101,t101) IDENT: Sub64F0x2(t30,t30) IDENT: Sub64(t13,t13) and from example 1 we have CmpEQ8x8 too. I don't think I can be bothered with the floating point ones (CmpF64, Sub64F0x2) as that tends to be more complex, but the others might be foldable. A good thing to do would be to run a bunch of codec-y code with the above patch in place, and see if any more cases pop out, so they can all be fixed at once. FTR, the constant folder already has rules of the form op(t,t) ==> t for op in And64, And32, And8, Or64, Or32, Or16, Or8, Max32U and op(t,t) ==> 0 for op in Xor64, Xor32, Xor16, Xor8, XorV128, Sub32 which is probably why you were unable to exhibit the problem using any xor instructions. The pcmpeqb case for mmx and xmm registers has been fixed in r2058. |
Created attachment 38186 [details] Test case for pcmpeqb I ran across a case where valgrind reported an undefined value that I was sure was a false positive, but after investigating more closely, it appears valgrind was simply propagating the validity of an actual undefined value to the wrong place. Instructions such as xor, sub, pxor, psub, and pcmpeq with identical source and destination arguments are common idioms used to generate simple constants (0 in the first four cases, -1 in the last case), and the output bits are defined, regardless of whether the input bits are. I've attached a simple test case which exhibits the problem using pcmpeq. It produces multiple "Use of uninitialized value of size 4/Conditional jump or move depends on uninitialized value(s)" warnings inside the printf. A similar example using xor does not exhibit the problem.