The Intel C compiler uses the (hideously-complex-whatever-happened-to-RISC) PCMPISTRI instruction in various optimized string routines. I am attaching an illustrative test case. Hopefully, there are only a few permutations of this instruction that actually need precise validity bit tracking... Reproducible: Always Steps to Reproduce: 1. gcc -O3 -o test test.c 2. valgrind --partial-loads-ok=yes ./test 3. :-( Actual Results: "Syscall param exit_group(status) contains uninitialised byte(s)" Expected Results: No warnings, since this program always exits with status 12, because the bytes beyond the end of the (aligned) load are never used.
Created attachment 75181 [details] Test case illustrating pcmpistri validity bit propagation failure
Created attachment 75187 [details] Refactoring PCMPISTRI 0x3A This patch factors out the PCMPISTRI 0x3A case (and 0x38, which is equivalent) and simplifies the code. This C reference code can serve as a basis, I hope, for generating expanded IR. Actually if you trace the code paths though, it is not too bad... A bunch of the code is just marshalling data through the dirty helper interface; the actual computation can all be expressed as straightforward vector and scalar operations, I think. Note that this patch does not break any regression tests, so I am pretty confident it is a valid refactoring.
Created attachment 75190 [details] Fix (harmless) bug in test case PCMPISTRI clobbers condition codes
Created attachment 75191 [details] [VEX] Fix PCMPISTRI 0x3A (and 0x38) I decided to go ahead and "just do it". This patch replaces the call to a "dirty helper" with memcheck-friendly IR for PCMPISTRI 0x3A/0x38 (aka. "strlen"). It also includes the refactored C code so you can compare the IR to the original C... Note that all of this code is dead except for the diffs against guest_amd64_toIR.c, and part of that is #ifdef'd out (the "dirty helper" version). All regression tests still pass, plus the test case attached to this bug now works.
Created attachment 75204 [details] [VEX] Simplified (and corrected) patch This is nearly identical to patch 75191, but without all of the dead code (which was only useful as a reference anyway). Also it fixes a bug: Shl32(1, Ctz32(0)) is undefined. This had the effect of causing the regression tests to pass on one run and then fail on another... Just renaming the source code directory could make a difference. Undefined behavior is nasty. Anyway, this patch now passes the regression tests consistently.
Created attachment 75205 [details] [VEX] Yet another version of patch OK, so Mux0X works but kills memcheck's validity bit tracking. This final (?) version of the patch fixes that. It still passes all regression tests; it fixes the test case on this bug; and, together with the patches on bug 294285, it seems to do very, very well on the optimized output from the Intel C compiler. Finally!!
Created attachment 75363 [details] Patch to add regtest for pcmpistri This patch adds a regression test "insn-pcmpistri" to memcheck/tests/amd64. It fails on the current Valgrind but succeeds with patch 75205 applied.
Committed, r2562, r13132. Thanks.