Bug 308627 - pmovmskb validity bit propagation is imprecise
Summary: pmovmskb validity bit propagation is imprecise
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.9.0.SVN
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
: 308626 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-10-18 21:44 UTC by Patrick J. LoPresti
Modified: 2013-03-03 11:06 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Test case illustrating pmovmskb validity bit propagation failure (258 bytes, application/octet-stream)
2012-10-18 21:45 UTC, Patrick J. LoPresti
Details
somewhat out of date fix, vex side (18.65 KB, patch)
2012-10-19 10:13 UTC, Julian Seward
Details
.. and the fixes for the memcheck side (3.04 KB, patch)
2012-10-19 10:13 UTC, Julian Seward
Details
VEX patch retargeted to current SVN (18.64 KB, patch)
2012-11-04 21:27 UTC, Patrick J. LoPresti
Details
memcheck patch rebased to current SVN (3.19 KB, patch)
2012-11-05 01:18 UTC, Patrick J. LoPresti
Details
VEX patch rebased against current SVN (v2) (20.32 KB, patch)
2012-11-05 17:41 UTC, Patrick J. LoPresti
Details
memcheck patch rebased to current SVN (3.70 KB, patch)
2012-11-05 17:42 UTC, Patrick J. LoPresti
Details
memcheck patch rebased to current SVN (v3) (3.88 KB, patch)
2012-11-05 19:51 UTC, Patrick J. LoPresti
Details
memcheck patch rebased to current SVN (v4) (3.82 KB, patch)
2012-11-06 01:45 UTC, Patrick J. LoPresti
Details
Patch adding regtest for bsfl (5.16 KB, patch)
2012-11-13 23:18 UTC, Patrick J. LoPresti
Details
Regression tests for bsfl/pmovmskb (18.66 KB, patch)
2012-11-14 03:21 UTC, Patrick J. LoPresti
Details
Regression tests for bsfl/pmovmskb validity bit tracking (20.18 KB, patch)
2012-11-15 21:10 UTC, Patrick J. LoPresti
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Patrick J. LoPresti 2012-10-18 21:44:08 UTC
(This is related to, but distinct from, bug 308626)

This is with current SVN (revision 13057).

To reproduce, compile the attached program and run it under Valgrind.

I expect this to produce no warnings, because even though the SSE2 vector starts as an undefined value, the code explicitly sets its low 16 bits to zero and only examines data depending on those bits.

Thus the behavior of this program is perfectly deterministic; it is guaranteed to return zero from main.

The problem is that memcheck is not propagating the validity bits from the SSE2 register for the PMOVMSKB instruction.

Similar code shows up in certain SSE2 optimized string routines.


Reproducible: Always

Steps to Reproduce:
1. Compile the attached program with "gcc -O3 -Wall -o test test.c"
2. Run "valgrind test"

Actual Results:  
Valgrind produces one warning.

Expected Results:  
Valgrind should produce no warning.
Comment 1 Patrick J. LoPresti 2012-10-18 21:45:11 UTC
Created attachment 74640 [details]
Test case illustrating pmovmskb validity bit propagation failure
Comment 2 Julian Seward 2012-10-19 10:13:02 UTC
Created attachment 74643 [details]
somewhat out of date fix, vex side
Comment 3 Julian Seward 2012-10-19 10:13:43 UTC
Created attachment 74644 [details]
.. and the fixes for the memcheck side
Comment 4 Julian Seward 2012-10-19 10:25:05 UTC
These are somewhat out of date, but easy enough to rebase, patches
that improve definedness handling both for PMOVMSKB and for the BSF
that typically follows it.

The significant changes are in the front ends (xx_toIR.c).  pmovmskb
is currently done by calling a clean helper function.  This is opaque
to Memcheck so it assumes worst-case undefinedness propagation from
args to results, producing a completely undefined result byte if any
of the bits in the original vector are undefined.  The patch replaces
the call with the use of a new IROp, Iop_GetMSBs8x8, and Memcheck
instruments that with itself, leading to exact handling of the pmovmsb
case.

The patches also change the implementation of BSF.  BSF compares the
value with zero, and if it is nonzero, uses Iop_Ctz32 to count
trailing zeroes.  The changes are: (1) use new Iop_ExpCmpEQ/NE to do
the zero check, which Memcheck treats exactly, and (2) handle Ctz32
more accurately in Memcheck; see expensiveCountTrailingZeroes32().

All of this needs checking and testing.  Especially
expensiveCountTrailingZeroes32() could do with looking over to see if
it makes sense.

I don't know if the last hunk in the Memcheck patch is necessary

@@ -3916,7 +3988,8 @@
             case Iex_Const:
                return isBogusAtom(e);
             case Iex_Unop: 
-               return isBogusAtom(e->Iex.Unop.arg);
+               return isBogusAtom(e->Iex.Unop.arg)
+|| e->Iex.Unop.op == Iop_GetMSBs8x8;
             case Iex_GetI:
                return isBogusAtom(e->Iex.GetI.ix);
             case Iex_Binop: 

and I would prefer not to have that, unless it is shown to be
necessary.
Comment 5 Patrick J. LoPresti 2012-11-04 21:27:40 UTC
Created attachment 75011 [details]
VEX patch retargeted to current SVN

I have retargeted your VEX patch to current SVN.  I even included the random comment fix in host_amd64_isel.c :-)

The retargeting was not too hard.  I did have to change AMD64Instr_MovZLQ(...) to AMD64Instr_movxLQ(False,...).  (I believe that is correct?)

The biggest difference between the original patch and this is in priv/guest_amd64_isel.c, which appears to have been refactored a bit to support AVX (in particular, see guest_amd64_toIR.c:dis_PMOVMSKB_128() ).  You might want to examine that diff carefully.

I will take a crack at retargeting the memcheck patch next.

Other than these retargeted patches themselves -- and doing what testing I can -- what would you like from me in order to commit them?
Comment 6 Patrick J. LoPresti 2012-11-05 01:18:32 UTC
Created attachment 75018 [details]
memcheck patch rebased to current SVN

I had trouble proving that your expensiveCountTrailingZeroes32 would always work, so I wrote it a little differently.  (In retrospect, I think yours probably worked fine, but I believe mine is simpler.  I based it on a suggestion from the valgrind-users mailing list.)

I also made it apply to Ctz64, since that is what the guest_amd64 IR generator produces.

Together these patches compile and pass all existing regression tests.  

Unfortunately, this does not (yet) fix the test case in bug 308628, since the IR generator emits a "cmpNE64" against the value, and memcheck does not handle that precisely.   By hacking my memcheck to always call expensiveCmpEQorNE for CmpNE64, I got everything to work.  But that is probably not what you want to do.  I could use some guidance here.
Comment 7 Patrick J. LoPresti 2012-11-05 17:31:00 UTC
OK, I figured out that I need VEX to generate the "expensive" 64-bit compare (Iop_ExpCmpNE64) for bsfl and then memcheck needs to handle it.  Updated patches forthcoming...
Comment 8 Patrick J. LoPresti 2012-11-05 17:41:25 UTC
Created attachment 75030 [details]
VEX patch rebased against current SVN (v2)

Generate Iop_ExpCmpNE64 for AMD64 bsfl.  Handle it in on AMD64 hosts.
Comment 9 Patrick J. LoPresti 2012-11-05 17:42:12 UTC
Created attachment 75031 [details]
memcheck patch rebased to current SVN

Handle Iop_ExpCmpNE64.
Comment 10 Patrick J. LoPresti 2012-11-05 19:51:19 UTC
Created attachment 75034 [details]
memcheck patch rebased to current SVN (v3)

Previous memcheck logic computed vbits as (ctz(vatom) <= ctz(atom) ? -1 : 0).  But this is incorrect when vatom is zero (i.e. all input bits are valid).

I fixed the logic to be (vatom & (ctz(vatom) <= ctz(atom))), and now all regression tests pass :-)

I have also confirmed that these patches fix both the test case in this bug and the one in bug 308626, at least on AMD64 host+guest.
Comment 11 Patrick J. LoPresti 2012-11-06 01:45:10 UTC
Created attachment 75046 [details]
memcheck patch rebased to current SVN (v4)

OK, I yield.  (I should know better than to try to improve on Julian's code...)  The problem with the ctz(vbits) <= ctz(atom) formulation is that it gives the wrong answer when vbits=0 (i.e. the value is fully-defined).  I could not find an efficient way to fix that up.  So I went back to the "pcast(vbits & (atom ^ (atom-1)))" formulation from the original patch, suitably extended to handle both 32- and 64-bit.

This is (I hope) the final version.  I have confirmed that they pass all regression tests on AMD64 and actually handle the test cases both in this bug and in bug 308626.
Comment 12 Julian Seward 2012-11-08 10:59:28 UTC
Committed, r2559, r13108.  Thanks.
Comment 13 Julian Seward 2012-11-08 11:00:02 UTC
*** Bug 308626 has been marked as a duplicate of this bug. ***
Comment 14 Patrick J. LoPresti 2012-11-13 23:16:22 UTC
Reopening bug so I can attach patches with test cases.
Comment 15 Patrick J. LoPresti 2012-11-13 23:18:41 UTC
Created attachment 75233 [details]
Patch adding regtest for bsfl

This patch adds a regression test for memcheck's bsfl validity bit propagation.  It is only for AMD64, but duplicating it for x86 should be pretty easy.
Comment 16 Patrick J. LoPresti 2012-11-14 03:21:45 UTC
Created attachment 75235 [details]
Regression tests for bsfl/pmovmskb

This patch supersedes the previous one, and includes tests for validity bit tracking on both bsfl and pmovmskb.
Comment 17 Patrick J. LoPresti 2012-11-15 21:10:24 UTC
Created attachment 75290 [details]
Regression tests for bsfl/pmovmskb validity bit tracking

Adopt Julian's suggestion from chat.  These regression tests supersede the previous ones.
Comment 18 Julian Seward 2012-11-19 15:12:45 UTC
(In reply to comment #17)
> Created attachment 75290 [details]
> Regression tests for bsfl/pmovmskb validity bit tracking

Committed, r13129.  Thanks.
Comment 19 Julian Seward 2013-03-03 11:06:09 UTC
Closing as fixed, since AFAICS we are done with this one.  LMK if
that's not correct.