Description
Patrick J. LoPresti
2012-02-17 05:44:25 UTC
Created attachment 68862 [details]
SSE4.2 assembly code for open-coded "strlen" generated by ICC
Created attachment 75166 [details]
[VEX] Support V128 returns from dirty helpers (AMD64 only)
This patch to host_amd64_isel.c adds support for "dirty" helper functions that return a V128 union.
Created attachment 75167 [details]
memcheck patch to use V128 helper for expr2vbits_Load
Together with the VEX patch, this patch fixes the test case attached to this bug without introducing new regressions on AMD64.
Of course, it will also break all other architectures until VEX is updated to support V128 dirty helpers on all of them.
Also it leaves intact the "four I64 loads" approach for V256 (AVX).
Created attachment 75168 [details]
memcheck patch to use V128 helper for expr2vbits_Load (v2)
Previous patch had a bug. Fixed. Better test case coming soon, too...
Created attachment 75169 [details]
Better main() for test
To compile, run
gcc -o test-my-strlen test-my-strlen.c my_strlen.s
With existing Valgrind, this reports four "Invalid read of size 8" errors. With the patches on this bug, it reports four "Invalid read of size 16" errors. With the patches plus "--partial-loads-ok=yes", it reports none, which is correct :-)
(This assumes strdup() returns a 16-byte-aligned memory block, which is likely since malloc() returns storage with suitable alignment for anything. This test case also prints out the pointers so you can see that the low nibble is zero.)
*** This bug has been confirmed by popular vote. *** I ran into this bug with the GMP library, which uses 16-byte SSE memory copy. The problem is being discussed on the GMP bugs list, see http://gmplib.org/list-archives/gmp-bugs/2013-February/002959.html Patrick's patch applied to Valgrind 3.8.1 results in the expected behaviour. Oh the whole, this isn't very difficult. We'll have to monkey around with the insn selectors for all the back ends, but that's tedious mostly. The real difficulty is (as usual) writing a comprehensive, portable test case to verify that nothing got screwed up. Created attachment 81508 [details]
Patch for 128/256 bit returns, vex side, 01 Aug 2013, x86/amd64/arm/ppc32/ppc64
Created attachment 81509 [details]
Patch for 128/256 bit returns, valgrind side, 01 Aug 2013, x86/amd64/arm/ppc32/ppc64
Includes Patrick's original core 128-bit PLO patch for Memcheck,
and a comprehensive test case in memcheck/tests/test-plo-vec.c
including reference 32/64-little-endian outputs
(not wired up currently)
Created attachment 81529 [details]
MIPS part of the patch
Apply patches in the comments #9 and #10 first, then this one.
(In reply to comment #11) > Created attachment 81529 [details] > MIPS part of the patch Petar, thanks for that. I just had a look; it seems OK. One thing, though: the _toIR.c bit removes IR generation (afaics, a pass-through to the host) for a SYNC instruction. Is that intended to be part of the patch? (and, unrelatedly, is it safe, not to pass the sync through to the host?) (In reply to comment #12) > (In reply to comment #11) > > Created attachment 81529 [details] > > MIPS part of the patch > > Petar, thanks for that. I just had a look; it seems OK. One thing, though: > the _toIR.c bit removes IR generation (afaics, a pass-through to the host) > for a SYNC instruction. Is that intended to be part of the patch? > (and, unrelatedly, is it safe, not to pass the sync through to the host?) @Julian, yes, removing SYNC dirty helper is intended part of the patch. It may not have been thoroughly thought through, we previously ignored instruction SYNCI, now we ignore SYNC. SYNC does ordering of load/store access to shared memory (completion barriers and ordering barriers). If it is not safe, we would really benefit of seeing a Valgrind issue in which this is needed. All the tests pass without it though. (In reply to comment #13) > (In reply to comment #12) > > (In reply to comment #11) > > > Created attachment 81529 [details] > > > MIPS part of the patch > > > > Petar, thanks for that. I just had a look; it seems OK. One thing, though: > > the _toIR.c bit removes IR generation (afaics, a pass-through to the host) > > for a SYNC instruction. Is that intended to be part of the patch? > > (and, unrelatedly, is it safe, not to pass the sync through to the host?) > > @Julian, > yes, removing SYNC dirty helper is intended part of the patch. It may not > have been thoroughly thought through, we previously ignored instruction > SYNCI, now we ignore SYNC. SYNC does ordering of load/store access to shared > memory (completion barriers and ordering barriers). > If it is not safe, we would really benefit of seeing a Valgrind issue in > which this is needed. All the tests pass without it though. Just to clarify the previous comment, we do plan to run additional tests and investigate further if it is required or not, but the approach is to ignore sync/synci instructions in this change (as you do not need it) and try to trigger an error with additional test examples. Created attachment 81604 [details]
s390 part of the patch
The patch includes
1) Introduction of RetLoc framework to s390
2) Move the insn selection to copy the return value to dst register out of helper call insn selection.
3) s390 support for IRExprP__BBPTR and IRExprP__VECRET
Committed: 2739 (IR and JIT front/middle/backend changes) 13488 The basic Memcheck fix, from Patrick 13489 Comprehensive test case (not properly connected up) Thanks Patrick, Petar, Maran. Do we want to also fix the 256-bit return case, so we don't have to come back to this swamp later? Or wait for another day? The infrastructure to do it is all in place, so it should be easy. Added proper test cases for amd64: r13491. Created attachment 81686 [details]
Generic mc_LOADVx implementation
This patch creates a generic mc_LOADVx() and mc_LOADVx_slow(), taking the number of bits as an argument. By itself, this patch adds no functionality... But with it applied, helperc_LOADV256be and helperc_LOAD256le and helperc_LOADV512be and so forth will be one-line additions.
I have verified that the regression tests on AMD64 still pass after applying this.
Seems to confuse iterating over bytes with iterating over ULongs? - for (j=0 ; j<2 ; ++j) { + for (j=0 ; j<nBytes ; ++j) { sm = get_secmap_for_reading_low(a + 8*j); sm_off16 = SM_OFF_16(a + 8*j); Segfaults for me on Fedora 17. Created attachment 81711 [details]
Add support for V256 partial-loads-ok (memcheck)
This patch adds support for partial-loads-ok on 32-byte aligned accesses. Includes an update to the test case.
Created attachment 81712 [details]
Add support for V256 partial-loads-ok (VEX)
VEX part of V256 partial-loads-ok patch. Adds support for V256 * arguments to dirty helpers. Only for AMD64 (do any other architectures have 256-bit registers?)
(In reply to comment #20) > Created attachment 81711 [details] > Add support for V256 partial-loads-ok (memcheck) Committed, r13500, r13501. (In reply to comment #21) > Created attachment 81712 [details] > Add support for V256 partial-loads-ok (VEX) Committed, r2743. Thanks for the patches. Thank you! |