(This is related to, but distinct from, bug 308626) This is with current SVN (revision 13057). To reproduce, compile the attached program and run it under Valgrind. I expect this to produce no warnings, because even though the SSE2 vector starts as an undefined value, the code explicitly sets its low 16 bits to zero and only examines data depending on those bits. Thus the behavior of this program is perfectly deterministic; it is guaranteed to return zero from main. The problem is that memcheck is not propagating the validity bits from the SSE2 register for the PMOVMSKB instruction. Similar code shows up in certain SSE2 optimized string routines. Reproducible: Always Steps to Reproduce: 1. Compile the attached program with "gcc -O3 -Wall -o test test.c" 2. Run "valgrind test" Actual Results: Valgrind produces one warning. Expected Results: Valgrind should produce no warning.
Created attachment 74640 [details] Test case illustrating pmovmskb validity bit propagation failure
Created attachment 74643 [details] somewhat out of date fix, vex side
Created attachment 74644 [details] .. and the fixes for the memcheck side
These are somewhat out of date, but easy enough to rebase, patches that improve definedness handling both for PMOVMSKB and for the BSF that typically follows it. The significant changes are in the front ends (xx_toIR.c). pmovmskb is currently done by calling a clean helper function. This is opaque to Memcheck so it assumes worst-case undefinedness propagation from args to results, producing a completely undefined result byte if any of the bits in the original vector are undefined. The patch replaces the call with the use of a new IROp, Iop_GetMSBs8x8, and Memcheck instruments that with itself, leading to exact handling of the pmovmsb case. The patches also change the implementation of BSF. BSF compares the value with zero, and if it is nonzero, uses Iop_Ctz32 to count trailing zeroes. The changes are: (1) use new Iop_ExpCmpEQ/NE to do the zero check, which Memcheck treats exactly, and (2) handle Ctz32 more accurately in Memcheck; see expensiveCountTrailingZeroes32(). All of this needs checking and testing. Especially expensiveCountTrailingZeroes32() could do with looking over to see if it makes sense. I don't know if the last hunk in the Memcheck patch is necessary @@ -3916,7 +3988,8 @@ case Iex_Const: return isBogusAtom(e); case Iex_Unop: - return isBogusAtom(e->Iex.Unop.arg); + return isBogusAtom(e->Iex.Unop.arg) +|| e->Iex.Unop.op == Iop_GetMSBs8x8; case Iex_GetI: return isBogusAtom(e->Iex.GetI.ix); case Iex_Binop: and I would prefer not to have that, unless it is shown to be necessary.
Created attachment 75011 [details] VEX patch retargeted to current SVN I have retargeted your VEX patch to current SVN. I even included the random comment fix in host_amd64_isel.c :-) The retargeting was not too hard. I did have to change AMD64Instr_MovZLQ(...) to AMD64Instr_movxLQ(False,...). (I believe that is correct?) The biggest difference between the original patch and this is in priv/guest_amd64_isel.c, which appears to have been refactored a bit to support AVX (in particular, see guest_amd64_toIR.c:dis_PMOVMSKB_128() ). You might want to examine that diff carefully. I will take a crack at retargeting the memcheck patch next. Other than these retargeted patches themselves -- and doing what testing I can -- what would you like from me in order to commit them?
Created attachment 75018 [details] memcheck patch rebased to current SVN I had trouble proving that your expensiveCountTrailingZeroes32 would always work, so I wrote it a little differently. (In retrospect, I think yours probably worked fine, but I believe mine is simpler. I based it on a suggestion from the valgrind-users mailing list.) I also made it apply to Ctz64, since that is what the guest_amd64 IR generator produces. Together these patches compile and pass all existing regression tests. Unfortunately, this does not (yet) fix the test case in bug 308628, since the IR generator emits a "cmpNE64" against the value, and memcheck does not handle that precisely. By hacking my memcheck to always call expensiveCmpEQorNE for CmpNE64, I got everything to work. But that is probably not what you want to do. I could use some guidance here.
OK, I figured out that I need VEX to generate the "expensive" 64-bit compare (Iop_ExpCmpNE64) for bsfl and then memcheck needs to handle it. Updated patches forthcoming...
Created attachment 75030 [details] VEX patch rebased against current SVN (v2) Generate Iop_ExpCmpNE64 for AMD64 bsfl. Handle it in on AMD64 hosts.
Created attachment 75031 [details] memcheck patch rebased to current SVN Handle Iop_ExpCmpNE64.
Created attachment 75034 [details] memcheck patch rebased to current SVN (v3) Previous memcheck logic computed vbits as (ctz(vatom) <= ctz(atom) ? -1 : 0). But this is incorrect when vatom is zero (i.e. all input bits are valid). I fixed the logic to be (vatom & (ctz(vatom) <= ctz(atom))), and now all regression tests pass :-) I have also confirmed that these patches fix both the test case in this bug and the one in bug 308626, at least on AMD64 host+guest.
Created attachment 75046 [details] memcheck patch rebased to current SVN (v4) OK, I yield. (I should know better than to try to improve on Julian's code...) The problem with the ctz(vbits) <= ctz(atom) formulation is that it gives the wrong answer when vbits=0 (i.e. the value is fully-defined). I could not find an efficient way to fix that up. So I went back to the "pcast(vbits & (atom ^ (atom-1)))" formulation from the original patch, suitably extended to handle both 32- and 64-bit. This is (I hope) the final version. I have confirmed that they pass all regression tests on AMD64 and actually handle the test cases both in this bug and in bug 308626.
Committed, r2559, r13108. Thanks.
*** Bug 308626 has been marked as a duplicate of this bug. ***
Reopening bug so I can attach patches with test cases.
Created attachment 75233 [details] Patch adding regtest for bsfl This patch adds a regression test for memcheck's bsfl validity bit propagation. It is only for AMD64, but duplicating it for x86 should be pretty easy.
Created attachment 75235 [details] Regression tests for bsfl/pmovmskb This patch supersedes the previous one, and includes tests for validity bit tracking on both bsfl and pmovmskb.
Created attachment 75290 [details] Regression tests for bsfl/pmovmskb validity bit tracking Adopt Julian's suggestion from chat. These regression tests supersede the previous ones.
(In reply to comment #17) > Created attachment 75290 [details] > Regression tests for bsfl/pmovmskb validity bit tracking Committed, r13129. Thanks.
Closing as fixed, since AFAICS we are done with this one. LMK if that's not correct.