324050 – Helgrind: SEGV because of unaligned stack when using movdqa

Bug 324050 - Helgrind: SEGV because of unaligned stack when using movdqa

Summary: Helgrind: SEGV because of unaligned stack when using movdqa

Status:	RESOLVED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	helgrind (show other bugs)
Version:	3.9.0.SVN
Platform:	Compiled Sources Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2013-08-26 08:45 UTC by Matthias Schwarzott
Modified:	2014-09-08 10:11 UTC (History)
CC List:	2 users (show)

See Also:
Latest Commit:
Version Fixed In:

Attachments
Improve testcase none/tests/pth_stackalign to check for given alignment (997 bytes, application/octet-stream) 2013-08-26 08:47 UTC, Matthias Schwarzott	Details
Testcase demonstrating crash of movdqa in thread (2.10 KB, text/x-csrc) 2013-10-29 11:19 UTC, Matthias Schwarzott	Details
Realign stack for preloaded libs (1.48 KB, patch) 2013-10-30 10:15 UTC, Matthias Schwarzott	Details
Fix stack alignment for preload libs (19.96 KB, patch) 2014-05-27 08:27 UTC, Matthias Schwarzott	Details
Fix stack alignment for preload libs and add testcase (18.16 KB, patch) 2014-09-04 13:17 UTC, Matthias Schwarzott	Details
Show Obsolete (4) View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Matthias Schwarzott 2013-08-26 08:45:26 UTC

helgrind on x86 triggers a crash in my application because the stack in the threads is not 16-bytes aligned.
So the movdqa instruction will get SEGV.

I changed the testcase none/tests/pth_stackalign so that it can check against a given alignment.
Checking alignemnt results:
native: ok
memcheck: ok
helgrind: bad

Checking in gdb shows that mythread_wrapper from hg_intercepts.c is the root cause here.
So gcc generates code that destroys the alignment.
After changing "-mpreferred-stack-boundary=2" to "-mpreferred-stack-boundary=4" in configure.in everything works.

But I am not sure if this is the correct thing to do as it might enlarge stack usage or make some assumptions in other code invalid.

Or it should only be set for code that is running on the virtual cpu.

Comment 1 Matthias Schwarzott 2013-08-26 08:47:50 UTC

Created attachment 81937 [details]
Improve testcase none/tests/pth_stackalign to check for given alignment

With this change, stack alignment can be checked like this:
# valgrind --tool=helgrind ./none/tests/pth_stackalign 16

Comment 2 Matthias Schwarzott 2013-08-26 08:49:19 UTC

This bug may be related to #254646

Comment 3 Matthias Schwarzott 2013-08-26 11:32:18 UTC

Thinking about this topic, I think for code that needs to cooperate with other code, we must keep the setting of preferred-stack-boundary identical to the other code, so most likely just not pass this option to the compiler for code linking against other libs or being executed on the virtual cpu.

Comment 4 Matthias Schwarzott 2013-09-13 06:23:17 UTC

This is what happens when running the modified test none/tests/pth_stackalign.
It does not depend on gentoo, as the results on ubuntu are identical

ubuntu without valgrind: alignment difference = 0
ubuntu with packaged memcheck (vg-3.7.0): alignment difference = 0
ubuntu with packaged helgrind (vg-3.7.0): alignment difference = 8
ubuntu with memcheck (SVN): alignment difference = 0
ubuntu with helgrind (SVN): alignment difference = 8
ubuntu with patch memcheck (SVN): alignment difference = 0
ubuntu with patch helgrind (SVN): alignment difference = 0

gentoo without valgrind: alignment difference = 0
gentoo with packaged memcheck (vg-3.8.1): alignment difference = 0
gentoo with packaged helgrind (vg-3.8.1): alignment difference = 8
gentoo with memcheck (SVN): alignment difference = 0
gentoo with helgrind (SVN): alignment difference = 8
gentoo with patch memcheck (SVN): alignment difference = 0
gentoo with patch helgrind (SVN): alignment difference = 0

If necessary I can create a testcase that calls movdqa on a stack variable

Comment 5 Matthias Schwarzott 2013-09-13 07:46:00 UTC

The gcc docs state that "libraries that use callbacks always use the default setting"
From http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html:

    On Pentium and Pentium Pro, double and long double values should be aligned to an 8-byte boundary (see -malign-double) or suffer significant run time performance penalties. On Pentium III, the Streaming SIMD Extension (SSE) data type __m128 may not work properly if it is not 16-byte aligned.

    To ensure proper alignment of this values on the stack, the stack boundary must be as aligned as that required by any value stored on the stack. Further, every function must be generated such that it keeps the stack aligned. Thus calling a function compiled with a higher preferred stack boundary from a function compiled with a lower preferred stack boundary most likely misaligns the stack. It is recommended that libraries that use callbacks always use the default setting.

    This extra alignment does consume extra stack space, and generally increases code size. Code that is sensitive to stack space usage, such as embedded systems and operating system kernels, may want to reduce the preferred alignment to -mpreferred-stack-boundary=2.

Comment 6 Matthias Schwarzott 2013-10-29 11:19:53 UTC

Created attachment 83198 [details]
Testcase demonstrating crash of movdqa in thread

Comment 7 Matthias Schwarzott 2013-10-29 11:23:15 UTC

This testcase (pth_stackalign_movdqa.c) demonstrates the problem (the sse part is copied from none/tests/x86/insn_sse2.c).

To test just do:
# gcc -g -msse -o pth_stackalign_movdqa pth_stackalign_movdqa.c -lpthread
# valgrind --tool=helgrind ./pth_stackalign_movdqa
==24650== Helgrind, a thread error detector
==24650== Copyright (C) 2007-2013, and GNU GPL'd, by OpenWorks LLP et al.
==24650== Using Valgrind-3.9.0.TEST1 and LibVEX; rerun with -h for copyright info
==24650== Command: ./pth_stackalign_movdqa
==24650== 
==24650== 
==24650== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==24650==  General Protection Fault
==24650==    at 0x8048682: movdqa_2 (pth_stackalign_movdqa.c:43)
==24650==    by 0x8048764: ThreadFunction (pth_stackalign_movdqa.c:80)
==24650==    by 0x400A566: mythread_wrapper (hg_intercepts.c:233)
==24650==    by 0x4DA19F37: start_thread (pthread_create.c:305)
==24650==    by 0x4D945C0D: clone (clone.S:133)
==24650== 
==24650== For counts of detected and suppressed errors, rerun with: -v
==24650== Use --history-level=approx or =none to gain increased speed, at
==24650== the cost of reduced accuracy of conflicting-access information
==24650== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Memcheck is not affected:
# valgrind --tool=memcheck ./pth_stackalign_movdqa 
==25030== Memcheck, a memory error detector
==25030== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==25030== Using Valgrind-3.9.0.TEST1 and LibVEX; rerun with -h for copyright info
==25030== Command: ./pth_stackalign_movdqa
==25030== 
movdqa_2 ... ok
finished
==25030== 
==25030== HEAP SUMMARY:
==25030==     in use at exit: 0 bytes in 0 blocks
==25030==   total heap usage: 1 allocs, 1 frees, 136 bytes allocated
==25030== 
==25030== All heap blocks were freed -- no leaks are possible
==25030== 
==25030== For counts of detected and suppressed errors, rerun with: -v
==25030== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Test works fine without valgrind:
# ./pth_stackalign_movdqa 
movdqa_2 ... ok
finished

Should valgrind really exit, or call the SIGILL handler in case of misalignment?

Comment 8 Matthias Schwarzott 2013-10-30 10:15:36 UTC

Created attachment 83221 [details]
Realign stack for preloaded libs

This patch gets the idea of Philippe Waroquiers to keep the 4 byte stack alignment for valgrind code and only change it for preloaded libraries.
It basically adds "-mincoming-stack-boundary=2 -mpreferred-stack-boundary=4" to the compilation of preload libs.

But if there are no other unaligned calls in valgrind, it may even be enough to have the preload libs compiled without any stack alignment options.

Second issue: the flag is only tested for the primary architecture (configure tests without -m32/-m64 ...)
But this affects whole build system and not only this flag. I should create a new ticket for this.

Comment 9 Gregor Jasny 2014-02-17 12:58:03 UTC

The patch fixes the stack alignment problem for me. Don't forget to call autoregen.sh after applying it.

Comment 10 Matthias Schwarzott 2014-05-27 08:27:30 UTC

Created attachment 86852 [details]
Fix stack alignment for preload libs

Fix stack alignment for preloaded libs in a better way.

The core point is to seperate the core CFLAGS and the ones for preloaded
libs. These need a stack alignment that is compatible to the application
and lib compiler settings.

That means keep the stack settings unchanged for code that could call
back into the application.

Comment 11 Matthias Schwarzott 2014-07-22 06:23:42 UTC

Has anybody had a look at this fix?
Can I do something to get this commited?
Is there anything to be improved?

Comment 12 Matthias Schwarzott 2014-09-04 13:01:24 UTC

Is it planned to merge this for valgrind-3.10?

Comment 13 Matthias Schwarzott 2014-09-04 13:17:13 UTC

Created attachment 88564 [details]
Fix stack alignment for preload libs and add testcase

This is the updated patch to compile again (based on revision 14458 from 2014/09/04).

Comment 14 Julian Seward 2014-09-04 13:28:04 UTC

(In reply to Matthias Schwarzott from comment #12)
> Is it planned to merge this for valgrind-3.10?

Yes.  It's next on my todo list.  This evening.

Comment 15 Julian Seward 2014-09-05 20:03:25 UTC

Committed (modified version of the patch, without the tests) as r14471.
Thank you for the patch.  I think it is correct but is hard for me to test.
Can you test the result?

Comment 16 Julian Seward 2014-09-08 10:11:50 UTC

Fixed, r14471.