Created attachment 64374 [details] patch LibVEX_Alloc: align to 8 bytes if implied by nbytes Version: 3.7 SVN (using Devel) OS: Linux Because typical usage is: typedef struct { ... } struct_S; struct_S *p = LibVEX_Alloc( sizeof struct_S ); then LibVEX_Alloc must align its result to __alignof__(struct_S). Only nbytes is known, so LibVEX_Alloc must infer the required aligment from nbytes. On armv5tel, __alignof__(unsigned long long) is 8, even though armv5tel is a "32-bit" machine. The opcodes ldrd and strd require an address which is 8-byte aligned. Reproducible: Always Steps to Reproduce: Examine source to LibVEX_Alloc in VEX/pub/libvex.h. Actual Results: ALIGN = sizeof(void*)-1; which gives only 4-byte alignment on a "32-bit" machine, regardless of nbytes. Expected Results: 8-byte alignment if 8 divides nbytes.
Created attachment 64376 [details] patch LibVEX_Alloc: align to 8 bytes if implied by nbytes Align the result, not the increment!
Created attachment 64377 [details] patch LibVEX_Alloc: align to 8 bytes if implied by nbytes ... and side-step the g++ complaint: ../../VEX/pub/libvex.h:357:72: error: invalid conversion from ‘void*’ to ‘HChar* {aka char*}’ [-fpermissive]
Instead of guessing the alignment based on the number of bytes we could just ask GCC what the max alignment is: union variant { char c; short s; int i; long l; long long ll; float f; double d; long double ld; void *pto; void (*ptf)(void); } var; __alignof__(union variant) should give the correct answer. Then change the original code of ALIGN = sizeof(void*)-1; to ALIGN = __alignof__(union variant) - 1;
(In reply to comment #3) > Instead of guessing the alignment based on the number of bytes we could just > ask GCC what the max alignment is: __alignof__ is a gcc extension. Not every instance of valgrind is compiled with gcc.
(In reply to comment #4) > (In reply to comment #3) > > Instead of guessing the alignment based on the number of bytes we could just > > ask GCC what the max alignment is: > > __alignof__ is a gcc extension. Not every instance of valgrind is compiled > with gcc. struct align { char c; union variant x; } s; ALIGN = ((char *)&s.x - (char *)&s)) - 1;
(In reply to comment #5) > (In reply to comment #4) > > (In reply to comment #3) > > > Instead of guessing the alignment based on the number of bytes we could just > > > ask GCC what the max alignment is: > > > > __alignof__ is a gcc extension. Not every instance of valgrind is compiled > > with gcc. > > struct align { > char c; > union variant x; > } s; > > ALIGN = ((char *)&s.x - (char *)&s)) - 1; Make that ALIGN = ((UInt) ((UChar *)&s.x - (UChar *)&s)) - 1;
(In reply to comment #3) > Instead of guessing the alignment based on the number of bytes we could just > ask GCC what the max alignment is: > > union variant { > ... > void (*ptf)(void); void (*ptfu)(); > } var; It's entirely possible that the alignment of a pointer to function which takes indefinite number of arguments may be different from the alignment of a pointer to function which takes no arguments. [Especially 64-bit PowerPC users may not complain of the possibility of such a quirk!]
(In reply to comment #7) > (In reply to comment #3) > > Instead of guessing the alignment based on the number of bytes we could just > > ask GCC what the max alignment is: > > > > union variant { > > ... > > void (*ptf)(void); > void (*ptfu)(); > > } var; > > It's entirely possible that the alignment of a pointer to function which takes > indefinite number of arguments may be different from the alignment of a pointer > to function which takes no arguments. [Especially 64-bit PowerPC users may not > complain of the possibility of such a quirk!] Interesting. So replacing the ptf member with your ptfu member fixes the potential problem you describe (although it isn't really an issue in VEX as we don't muck with function pointers there).
Hmm. In your initial description you said that __alignof__(unsigned long long) is 8. I noticed that the value returned by alignof is not the value that the compiler uses for aligning data structures. struct align { char c; long long x; } s; int main() { printf("Computing alignment of long long x:\n"); printf("- using address difference = %d\n", (int)((char *)&s.x - (char *)&s)); printf("- using offsetof = %d\n", (int)offsetof(struct align, x)); printf("- using __alignof__ = %d\n", (int)__alignof__(long long)); return 0; } On my 32-bit x86 machine that program produces: Computing alignment of long long x: - using address difference = 4 - using offsetof = 4 - using __alignof__ = 8 That seems busted. According to http://gcc.gnu.org/onlinedocs/gcc/Alignment.html __alignof__ should report 4 here because that is clearly what GCC believes is the alignment requirement. Anyhow below is a patch that I propose to robustize the alignment computation. Note, that it does not use a pointer to a function with an unknown number of arguments as discussed in comments #7 and #8 because GCC will complain about it.
Created attachment 68811 [details] robustize alignment computation
I prefer to use 'unsigned' when dealing with bit-wise boolean operations, to initialize within the declaration when possible, and also to use 'const'. Thus I would write: unsigned const ALIGN = ((unsigned) ((UChar *)&s.x - (UChar *)&s)) - 1; If gcc complains about void (*ptfu)(); then instead use: void (*ptfu)(...); Yes, on i686 [tested via "gcc -m32" on x86_64: gcc 4.5.1] I do see: Computing alignment of long long x: - using address difference = 4 - using offsetof = 4 - using __alignof__ = 8 However, the Alignment.html seems to allow that. Notice the words "or the minimum alignment" and the sentence "Some machines never actually require alignment; they allow reference to any data type even at an odd address. For these machines, __alignof__ reports the smallest alignment that GCC will give the data type, usually as mandated by the target ABI." I believe this applies to i686. i386 never requires alignment. On i686, only some recent SSE* features require alignment, and then only to 16-bytes (MOVDQA, etc.) or 32-bytes (some AVX features which use 256-bit registers %ymmN.) An 8-byte quantity may be referenced at any address; due to very good implementation of the data cache, the speed penalty for unaligned access (when all bytes are present in cache) is at most 1 cycle: usually only when crossing a cache-line boundary.
What specific problem (crash, assertion, other failure) does this increased alignment solve? I'm reluctant to increase alignment here without clear cause, since this spreads out the data over a larger area and will make the already slow JIT even slower. In any case, we never use long double in the code, so that can be removed from the union. For sure I don't want to wind up with a 16-alignment requirement as a result.
(In reply to comment #12) > What specific problem (crash, assertion, other failure) does this > increased alignment solve? On ARM, gcc assumes that it can generate 'ldrd' and 'strd' when 8==alignof(struct foo), and in fact it does so for callgrind. The ldrd and strd instructions require 8-byte alignment (and even+odd register pair) even though ARM is only a 32-bit machine. See my message to [Valgrind-developers] on 10/08/2011, "LibVEX_Alloc alignment can be (4 mod 8)". Specifically, the code in VEX/priv/main_util.c has intentions for 8-aligned: static HChar temporary[N_TEMPORARY_BYTES] __attribute__((aligned(8))); static HChar permanent[N_PERMANENT_BYTES] __attribute__((aligned(8))); and for callgrind, gcc generates: (gdb) x/23i IRStmt_IMark ... 0x380e8824 <IRStmt_IMark+32>: strd r0, [r12, #8] ; garbage unless 8-aligned which requires 8-aligned. However, the previous code in LibVEX_Alloc did not guarantee an 8-aligned result. If the alignment is only 4 bytes, then callgrind crashes [depending on history of sizes in calls to LibVEX_Alloc.] If gcc is going to assume 16-byte alignment of anything, then an allocator must allow for this, or else the valgrind programmer must guarantee no such requirement at runtime. On x86 with SSE, today's gcc *does* assume 16-byte alignment of stack frames, so that MOVQDA opcode can be used for copying local structs.
The absolute minimum required change is exactly one line: ALIGN = 8 -1; else callgrind crashes on ARM because gcc generates 'strd' into a struct that was allocated by LibVEX_Alloc. strd requires 8-byte alignment, and LibVEX_Alloc did not provide this if the immediately-preceding call was for a size of (4 mod 8). Florian's changes try to document, generalize, and make it more robust. Implicitly, this is a battle against the compiler: source code might not be able to express all the requirements that the compiler faces.
Created attachment 68825 [details] alignment patch version 2 Here's an updated version without the long double member. That gives us 4 byte alignment on x86 and x86_64 with -m32. On s390x and x86_64 it will compute 8 byte alignment as we had before. John, if valgrind still crashes with this patch, we should overwrite the alignment in a platform specific way. Perhaps add some configury to stick in a -DVEX_ALIGNMENT_OVERRIDE=8 or something along those lines and then observe that in libvex.h I'm with Julian. We should not penalize other platforms unnecessarily.
Patch id=68825 fixes the crash in callgrind on ARM. One actual instance of calling inlined LibVEX_Alloc expands to: ----- 0x381feef0 <getAllocableRegs_ARM+16>: add r4, r4, #7 0x381feef4 <getAllocableRegs_ARM+20>: ldr r12, [r2] 0x381feef8 <getAllocableRegs_ARM+24>: bic r4, r4, #7 ----- which shows that ALIGN is 7.
Fixed in VEX r2263.