Bug 309921 - PCMPISTRI validity bit propagation is imprecise
Summary: PCMPISTRI validity bit propagation is imprecise
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.9.0.SVN
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-11-11 20:37 UTC by Patrick J. LoPresti
Modified: 2012-11-20 15:25 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments
Test case illustrating pcmpistri validity bit propagation failure (318 bytes, application/octet-stream)
2012-11-11 20:38 UTC, Patrick J. LoPresti
Details
Refactoring PCMPISTRI 0x3A (10.15 KB, patch)
2012-11-12 01:19 UTC, Patrick J. LoPresti
Details
Fix (harmless) bug in test case (324 bytes, patch)
2012-11-12 04:59 UTC, Patrick J. LoPresti
Details
[VEX] Fix PCMPISTRI 0x3A (and 0x38) (14.59 KB, patch)
2012-11-12 05:15 UTC, Patrick J. LoPresti
Details
[VEX] Simplified (and corrected) patch (7.35 KB, patch)
2012-11-12 23:43 UTC, Patrick J. LoPresti
Details
[VEX] Yet another version of patch (7.70 KB, patch)
2012-11-13 00:05 UTC, Patrick J. LoPresti
Details
Patch to add regtest for pcmpistri (19.25 KB, patch)
2012-11-19 22:05 UTC, Patrick J. LoPresti
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Patrick J. LoPresti 2012-11-11 20:37:06 UTC
The Intel C compiler uses the (hideously-complex-whatever-happened-to-RISC) PCMPISTRI instruction in various optimized string routines.

I am attaching an illustrative test case.  Hopefully, there are only a few permutations of this instruction that actually need precise validity bit tracking...

Reproducible: Always

Steps to Reproduce:
1. gcc -O3 -o test test.c
2. valgrind --partial-loads-ok=yes ./test
3. :-(
Actual Results:  
"Syscall param exit_group(status) contains uninitialised byte(s)"

Expected Results:  
No warnings, since this program always exits with status 12, because the bytes beyond the end of the (aligned) load are never used.
Comment 1 Patrick J. LoPresti 2012-11-11 20:38:10 UTC
Created attachment 75181 [details]
Test case illustrating pcmpistri validity bit propagation failure
Comment 2 Patrick J. LoPresti 2012-11-12 01:19:06 UTC
Created attachment 75187 [details]
Refactoring PCMPISTRI 0x3A

This patch factors out the PCMPISTRI 0x3A case (and 0x38, which is equivalent) and simplifies the code.  This C reference code can serve as a basis, I hope, for generating expanded IR.

Actually if you trace the code paths though, it is not too bad...  A bunch of the code is just marshalling data through the dirty helper interface; the actual computation can all be expressed as straightforward vector and scalar operations, I think.

Note that this patch does not break any regression tests, so I am pretty confident it is a valid refactoring.
Comment 3 Patrick J. LoPresti 2012-11-12 04:59:23 UTC
Created attachment 75190 [details]
Fix (harmless) bug in test case

PCMPISTRI clobbers condition codes
Comment 4 Patrick J. LoPresti 2012-11-12 05:15:30 UTC
Created attachment 75191 [details]
[VEX] Fix PCMPISTRI 0x3A (and 0x38)

I decided to go ahead and "just do it".

This patch replaces the call to a "dirty helper" with memcheck-friendly IR for PCMPISTRI 0x3A/0x38 (aka. "strlen").  It also includes the refactored C code so you can compare the IR to the original C...  Note that all of this code is dead except for the diffs against guest_amd64_toIR.c, and part of that is #ifdef'd out (the "dirty helper" version).

All regression tests still pass, plus the test case attached to this bug now works.
Comment 5 Patrick J. LoPresti 2012-11-12 23:43:21 UTC
Created attachment 75204 [details]
[VEX] Simplified (and corrected) patch

This is nearly identical to patch 75191, but without all of the dead code (which was only useful as a reference anyway).

Also it fixes a bug: Shl32(1, Ctz32(0)) is undefined.  This had the effect of causing the regression tests to pass on one run and then fail on another...  Just renaming the source code directory could make a difference.  Undefined behavior is nasty.

Anyway, this patch now passes the regression tests consistently.
Comment 6 Patrick J. LoPresti 2012-11-13 00:05:30 UTC
Created attachment 75205 [details]
[VEX] Yet another version of patch

OK, so Mux0X works but kills memcheck's validity bit tracking.

This final (?) version of the patch fixes that.  It still passes all regression tests; it fixes the test case on this bug; and, together with the patches on bug 294285, it seems to do very, very well on the optimized output from the Intel C compiler.  Finally!!
Comment 7 Patrick J. LoPresti 2012-11-19 22:05:14 UTC
Created attachment 75363 [details]
Patch to add regtest for pcmpistri

This patch adds a regression test "insn-pcmpistri" to memcheck/tests/amd64.  It fails on the current Valgrind but succeeds with patch 75205 applied.
Comment 8 Julian Seward 2012-11-20 15:25:59 UTC
Committed, r2562, r13132.  Thanks.