Created attachment 68967 [details] Test case demonstrating false negative with --aligned-loads-ok=yes Version: 3.7 SVN (using Devel) OS: Linux The current implementation of "--partial-loads-ok=yes" suppresses the error message when an aligned load reads memory outside of an addressable block, provided that any portion of the loaded word is addressable. But it also marks all of the loaded data as "valid", which results in errors being missed. I am attaching a test case with representative code. Later today I will post a patch with my proposed fix for the problem. Reproducible: Always Steps to Reproduce: Download attachment test-plo.c. gcc -g -O3 -Wall -o test-plo test-plo.c valgrind --partial-loads-ok=no ./test-plo valgrind --partial-loads-ok=yes ./test-plo Actual Results: Valgrind with "--partial-loads-ok=no" produces two errors, as expected. Valgrind with "--partial-loads-ok=yes" produces NO errors, even though the second call to aligned_strlen() is clearly erroneous. Expected Results: I expect Valgrind with "--partial-loads-ok=no" to generate two errors, one for each call to aligned_strlen() (where it reads past the end of the allocated block). I expect Valgrind with "--partial-loads-ok=yes" to produce one error, corresponding to the second (buggy) call to aligned_strlen(). I expect it to produce no error for the first (correct) call to aligned_strlen().
Created attachment 68968 [details] Patch to eliminate false negatives with --partial-loads-ok=yes With this patch, "valgrind --partial-loads-ok=yes ./test-plo" emits a single error: ==14997== Conditional jump or move depends on uninitialised value(s) ==14997== at 0x400662: main (test-plo.c:41) Note that this error corresponds to the faulty invocation of aligned_strlen(), while no error occurs for the correct invocation.
This puts another function call (to mc_LOADV64_common) and another conditional branch .. + ULong result = mc_LOADV64(a, bigendian, &addrvbits); + + if (LIKELY(addrvbits == V_BITS64_DEFINED)) { on the fast path .. is that correct?
Yes. We can easily avoid the extra function call by lifting the fast path into the caller; i.e. only performing the function call for the slower cases. (Or we could just force GCC to inline mc_LOADV64_common; it is pretty short.) The conditional branch should be predicted correctly using data in the L1 cache and thus add very little overhead... And I wanted to lift the error logic out of the lowest-level routines because this way extends naturally to 128-bit loads and beyond. If you really want "zero additional overhead for the fast path" to be a requirement, I will find a way to make that happen. (But only if you think it would make the patch acceptable. I am happy to do extra work, as long as it is not wasted.)
So actually .. I didn't understand the need for changes outside mc_LOADVn_slow. IIUC you want to change the V-status of bytes resulting from aligned loads across an accessibility boundary from defined to undefined -- which can be done entirely in mc_LOADVn_slow, can't it?
The answer comes when you try to extend this to 128-bit loads, which is how this patch began its life. If you want to extend this logic to 128 bits, while still composing the 128-bit load out of 2 64-bit loads, then you need information from _both_ 64-bit loads before you can determine the 128-bit result. (Because you need to know whether there is a mixture of accessible and inaccessible memory among those 128 bits, which you cannot determine by examining just one of the 64-bit loads.) The information you need from the multiple loads in order to make a decision are: - The validity bits as computed now; and - Which byte(s) were addressable My patch introduces a "addrvbits" result pointer to the load routines to hold a simple representation of the latter. This mechanism can be used to create a 128-bit load out of two 64-bit loads. It can similarly be used to create a 256-bit load out of four 64-bit loads. And so forth. But perhaps this is a discussion for another time. If you like, I will rework the patch as a change solely to the guts of mc_LOADVn_slow, and then we can have a different discussion about extending to 128 bits and beyond.
Created attachment 68987 [details] Simpler patch Here is a simpler patch to do the same thing with no impact on the fast path. Small moves are good.
The new patch is so small that it obviously: (a) does not even touch the fast path; and (b) does not alter the behavior for "--partial-loads-ok=no". The patch adds a total of two new code paths for "--partial-loads-ok=yes", and the single test case in this bug already exercises both of them. Could you give me some guidance on what additional tests you would like to see? Thanks.
Committed, with mods, r12430. Thanks for the patch and test case.
Patrick, can this be closed now? My impression is that it can.
Yes, this is fixed. Thanks, Julian.