Bug 283671 - LibVEX_Alloc must align result based on nbytes
Summary: LibVEX_Alloc must align result based on nbytes
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.7 SVN
Platform: Compiled Sources Linux
: NOR major
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-10-09 22:12 UTC by John Reiser
Modified: 2012-02-26 17:06 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
patch LibVEX_Alloc: align to 8 bytes if implied by nbytes (674 bytes, patch)
2011-10-09 22:12 UTC, John Reiser
Details
patch LibVEX_Alloc: align to 8 bytes if implied by nbytes (828 bytes, patch)
2011-10-09 23:35 UTC, John Reiser
Details
patch LibVEX_Alloc: align to 8 bytes if implied by nbytes (829 bytes, patch)
2011-10-10 01:59 UTC, John Reiser
Details
robustize alignment computation (860 bytes, patch)
2012-02-15 04:40 UTC, Florian Krohm
Details
alignment patch version 2 (973 bytes, patch)
2012-02-15 16:17 UTC, Florian Krohm
Details

Note You need to log in before you can comment on or make changes to this bug.
Description John Reiser 2011-10-09 22:12:39 UTC
Created attachment 64374 [details]
patch LibVEX_Alloc: align to 8 bytes if implied by nbytes

Version:           3.7 SVN (using Devel) 
OS:                Linux

Because typical usage is:
   typedef struct { ... } struct_S;
   struct_S *p = LibVEX_Alloc( sizeof struct_S );
then LibVEX_Alloc must align its result to __alignof__(struct_S).  Only nbytes is known, so LibVEX_Alloc must infer the required aligment from nbytes.  On armv5tel, __alignof__(unsigned long long) is 8, even though armv5tel is a "32-bit" machine.  The opcodes ldrd and strd require an address which is 8-byte aligned.

Reproducible: Always

Steps to Reproduce:
Examine source to LibVEX_Alloc in VEX/pub/libvex.h.

Actual Results:  
   ALIGN  = sizeof(void*)-1;
which gives only 4-byte alignment on a "32-bit" machine, regardless of nbytes.


Expected Results:  
8-byte alignment if 8 divides nbytes.
Comment 1 John Reiser 2011-10-09 23:35:29 UTC
Created attachment 64376 [details]
patch LibVEX_Alloc: align to 8 bytes if implied by nbytes

Align the result, not the increment!
Comment 2 John Reiser 2011-10-10 01:59:21 UTC
Created attachment 64377 [details]
patch LibVEX_Alloc: align to 8 bytes if implied by nbytes

... and side-step the g++ complaint:
../../VEX/pub/libvex.h:357:72: error: invalid conversion from ‘void*’ to ‘HChar* {aka char*}’ [-fpermissive]
Comment 3 Florian Krohm 2012-02-04 15:07:15 UTC
Instead of guessing the alignment based on the number of bytes we could just ask GCC what the max alignment is:

union variant {
     char c;
     short s;
     int i;
     long l;
     long long ll;
     float f;
     double d;
     long double ld;
     void *pto;
     void (*ptf)(void);
} var;

__alignof__(union variant)   should give the correct answer.

Then change the original code of  
ALIGN  = sizeof(void*)-1;
to
ALIGN = __alignof__(union variant) - 1;
Comment 4 John Reiser 2012-02-04 15:30:58 UTC
(In reply to comment #3)
> Instead of guessing the alignment based on the number of bytes we could just
> ask GCC what the max alignment is:

__alignof__ is a gcc extension.  Not every instance of valgrind is compiled with gcc.
Comment 5 Florian Krohm 2012-02-04 15:38:43 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > Instead of guessing the alignment based on the number of bytes we could just
> > ask GCC what the max alignment is:
> 
> __alignof__ is a gcc extension.  Not every instance of valgrind is compiled
> with gcc.

struct align {
  char c;
  union variant x;
} s;

ALIGN = ((char *)&s.x - (char *)&s)) - 1;
Comment 6 Florian Krohm 2012-02-04 15:48:07 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #3)
> > > Instead of guessing the alignment based on the number of bytes we could just
> > > ask GCC what the max alignment is:
> > 
> > __alignof__ is a gcc extension.  Not every instance of valgrind is compiled
> > with gcc.
> 
> struct align {
>   char c;
>   union variant x;
> } s;
> 
> ALIGN = ((char *)&s.x - (char *)&s)) - 1;

Make that 

ALIGN = ((UInt) ((UChar *)&s.x - (UChar *)&s)) - 1;
Comment 7 John Reiser 2012-02-04 21:31:04 UTC
(In reply to comment #3)
> Instead of guessing the alignment based on the number of bytes we could just
> ask GCC what the max alignment is:
> 
> union variant {
>  ...
>      void (*ptf)(void);
       void (*ptfu)();
> } var;

It's entirely possible that the alignment of a pointer to function which takes
indefinite number of arguments may be different from the alignment of a pointer
to function which takes no arguments.  [Especially 64-bit PowerPC users may not
complain of the possibility of such a quirk!]
Comment 8 Florian Krohm 2012-02-04 23:38:18 UTC
(In reply to comment #7)
> (In reply to comment #3)
> > Instead of guessing the alignment based on the number of bytes we could just
> > ask GCC what the max alignment is:
> > 
> > union variant {
> >  ...
> >      void (*ptf)(void);
>        void (*ptfu)();
> > } var;
> 
> It's entirely possible that the alignment of a pointer to function which takes
> indefinite number of arguments may be different from the alignment of a pointer
> to function which takes no arguments.  [Especially 64-bit PowerPC users may not
> complain of the possibility of such a quirk!]

Interesting. So replacing the ptf member with your ptfu member fixes the
potential problem you describe (although it isn't really an issue in VEX
as we don't muck with function pointers there).
Comment 9 Florian Krohm 2012-02-15 04:39:51 UTC
Hmm. In your initial description you said that __alignof__(unsigned long long) is 8.
I noticed that the value returned by alignof is not the value that the compiler
uses for aligning data structures.

struct align {
  char c;
  long long x;
} s;

int main()
{
   printf("Computing alignment of long long x:\n");
   printf("- using address difference = %d\n", (int)((char *)&s.x - (char *)&s));
   printf("- using offsetof           = %d\n", (int)offsetof(struct align, x));
   printf("- using __alignof__        = %d\n", (int)__alignof__(long long));
   return 0;
}

On my 32-bit x86 machine that program produces:

Computing alignment of long long x:
- using address difference = 4
- using offsetof           = 4
- using __alignof__        = 8

That seems busted. According to http://gcc.gnu.org/onlinedocs/gcc/Alignment.html  __alignof__ should report 4 here because that is clearly what GCC believes is the alignment requirement.

Anyhow below is a patch that I propose to robustize the alignment computation.
Note, that it does not use a pointer to a function with an unknown number of arguments as discussed in comments #7 and #8 because GCC will complain about it.
Comment 10 Florian Krohm 2012-02-15 04:40:39 UTC
Created attachment 68811 [details]
robustize alignment computation
Comment 11 John Reiser 2012-02-15 06:03:53 UTC
I prefer to use 'unsigned' when dealing with bit-wise boolean operations, to initialize within the declaration when possible, and also to use 'const'.  Thus I would write:
    unsigned const ALIGN = ((unsigned) ((UChar *)&s.x - (UChar *)&s)) - 1;

If gcc complains about
         void (*ptfu)();
then instead use:
         void (*ptfu)(...);

Yes, on i686 [tested via "gcc -m32" on x86_64: gcc 4.5.1] I do see:
   Computing alignment of long long x:
   - using address difference = 4
   - using offsetof           = 4
   - using __alignof__        = 8

However, the Alignment.html seems to allow that.  Notice the words "or the minimum alignment" and the sentence "Some machines never actually require alignment; they allow reference to any data type even at an odd address. For these machines, __alignof__ reports the smallest alignment that GCC will give the data type, usually as mandated by the target ABI."  I believe this applies to i686.

i386 never requires alignment.  On i686, only some recent SSE* features require alignment, and then only to 16-bytes (MOVDQA, etc.) or 32-bytes (some AVX features which use 256-bit registers %ymmN.)  An 8-byte quantity may be referenced at any address; due to very good implementation of the data cache, the speed penalty for unaligned access (when all bytes are present in cache) is at most 1 cycle: usually only when crossing a cache-line boundary.
Comment 12 Julian Seward 2012-02-15 08:50:15 UTC
What specific problem (crash, assertion, other failure) does this
increased alignment solve?  I'm reluctant to increase alignment
here without clear cause, since this spreads out the data over
a larger area and will make the already slow JIT even slower.

In any case, we never use long double in the code, so that can be
removed from the union.  For sure I don't want to wind up with
a 16-alignment requirement as a result.
Comment 13 John Reiser 2012-02-15 14:13:17 UTC
(In reply to comment #12)
> What specific problem (crash, assertion, other failure) does this
> increased alignment solve?

On ARM, gcc assumes that it can generate 'ldrd' and 'strd' when 8==alignof(struct foo), and in fact it does so for callgrind.  The ldrd and strd instructions require 8-byte alignment (and even+odd register pair) even though ARM is only a 32-bit machine.  See my message to [Valgrind-developers] on 10/08/2011, "LibVEX_Alloc alignment can be (4 mod 8)".

Specifically, the code in VEX/priv/main_util.c has intentions for 8-aligned:
   static HChar  temporary[N_TEMPORARY_BYTES] __attribute__((aligned(8)));
   static HChar  permanent[N_PERMANENT_BYTES] __attribute__((aligned(8)));
and for callgrind, gcc generates:
(gdb) x/23i IRStmt_IMark
     ...
   0x380e8824 <IRStmt_IMark+32>:	strd	r0, [r12, #8] ; garbage unless 8-aligned
which requires 8-aligned.  However, the previous code in LibVEX_Alloc did not guarantee an 8-aligned result.  If the alignment is only 4 bytes, then callgrind crashes [depending on history of sizes in calls to LibVEX_Alloc.]

If gcc is going to assume 16-byte alignment of anything, then an allocator must allow for this, or else the valgrind programmer must guarantee no such requirement at runtime.  On x86 with SSE, today's gcc *does* assume 16-byte alignment of stack frames, so that MOVQDA opcode can be used for copying local structs.
Comment 14 John Reiser 2012-02-15 15:15:40 UTC
The absolute minimum required change is exactly one line:
    ALIGN = 8 -1;
else callgrind crashes on ARM because gcc generates 'strd' into a struct that was allocated by LibVEX_Alloc.  strd requires 8-byte alignment, and LibVEX_Alloc did not provide this if the immediately-preceding call was for a size of (4 mod 8).

Florian's changes try to document, generalize, and make it more robust.  Implicitly, this is a battle against the compiler: source code might not be able to express all the requirements that the compiler faces.
Comment 15 Florian Krohm 2012-02-15 16:17:06 UTC
Created attachment 68825 [details]
alignment patch version 2

Here's an updated version without the long double member. That gives us 4 byte alignment on x86 and x86_64 with -m32. On s390x and x86_64 it will compute 8 byte alignment as we had before. John, if valgrind still crashes with this patch, we should overwrite the alignment in a platform specific way. Perhaps add some configury to stick in a -DVEX_ALIGNMENT_OVERRIDE=8  or something along those lines and then observe that in libvex.h
I'm with Julian. We should not penalize other platforms unnecessarily.
Comment 16 John Reiser 2012-02-15 21:17:57 UTC
Patch id=68825 fixes the crash in callgrind on ARM.  One actual instance of calling inlined LibVEX_Alloc expands to:
-----
0x381feef0 <getAllocableRegs_ARM+16>:	add	r4, r4, #7
0x381feef4 <getAllocableRegs_ARM+20>:	ldr	r12, [r2]
0x381feef8 <getAllocableRegs_ARM+24>:	bic	r4, r4, #7
-----
which shows that ALIGN is 7.
Comment 17 Florian Krohm 2012-02-26 17:06:10 UTC
Fixed in VEX r2263.