Bug 267997 - MacOSX: 64-bit valgrind segfaults on launch when built with Xcode 4.0.1
Summary: MacOSX: 64-bit valgrind segfaults on launch when built with Xcode 4.0.1
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.6.0
Platform: Unlisted Binaries macOS
: NOR crash
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
: 267342 267769 268792 269641 270309 270311 271337 274784 276637 283325 (view as bug list)
Depends on:
Blocks:
 
Reported: 2011-03-08 20:11 UTC by Sascha Kratky
Modified: 2011-10-12 16:09 UTC (History)
13 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
program that does the transformation listed in comment #8 (12.34 KB, text/plain)
2011-03-31 19:45 UTC, Julian Seward
Details
proposed patch (21.05 KB, patch)
2011-04-04 22:08 UTC, Julian Seward
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sascha Kratky 2011-03-08 20:11:01 UTC
Version:           3.6.0 (using Devel) 
OS:                OS X

valgrind immediately crashes upon startup.

Crash log:


Process:         valgrind [5461]
Path:            /usr/local/Cellar/valgrind/3.6.1/bin/valgrind
Identifier:      valgrind
Version:         ??? (???)
Code Type:       X86-64 (Native)
Parent Process:  tcsh [5459]

Date/Time:       2011-03-08 19:47:47.568 +0100
OS Version:      Mac OS X Server 10.6.6 (10J567)
Report Version:  6

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000134000000
Crashed Thread:  Unknown

Backtrace not available

Unknown thread crashed with X86 Thread State (64-bit):
  rax: 0x000000000000000e  rbx: 0x0000000134000000  rcx: 0x0000000000000000  rdx: 0x0000000000000000
  rdi: 0x0000000134000000  rsi: 0x0000000000000000  rbp: 0x0000000138429a90  rsp: 0x0000000138429a80
   r8: 0x0000000000000000   r9: 0x0000000000000000  r10: 0x0000000000000000  r11: 0x0000000000000000
  r12: 0x0000000000000000  r13: 0x0000000000000000  r14: 0x0000000000000000  r15: 0x0000000000000000
  rip: 0x000000013803f39a  rfl: 0x0000000000010206  cr2: 0x0000000134000000

Binary images description not available


Reproducible: Didn't try
Comment 1 Jeremy Lavergne 2011-03-22 20:54:18 UTC
Also occurs on Mac OS X Server 10.6.7 (10J869)
Comment 2 Jeremy Lavergne 2011-03-22 21:00:55 UTC
I was able to work around the issue by building 32-bit valgrind, instead of the default 64-bit for my system.
Comment 3 Jeremy Lavergne 2011-03-22 21:06:40 UTC
According to raim: building valgrind 64-bit version runs fine on 32-bit kernel.

Perhaps the issue is only when building 64-bit on 64-bits.
Comment 4 Mike McQuaid 2011-03-29 09:49:52 UTC
Looks like a dupe of http://bugs.kde.org/show_bug.cgi?id=267997, can you reproduce the same output with the same flags?
Comment 5 Julian Seward 2011-03-31 13:53:22 UTC
*** Bug 269641 has been marked as a duplicate of this bug. ***
Comment 6 Julian Seward 2011-03-31 14:22:59 UTC
Some initial results:

* I can reproduce this with Xcode 4.0.1.


* AFAICS it only affects the valgrinding of 64-bit processes; 32-bits is OK


* The tool executables (big files of the form memcheck-amd64-darwin, etc)
  segfault within a few instructions of gaining control from the kernel.


* My initial impression is that this is due to a bug in the linker
  (/usr/bin/ld), which is perhaps a new implementation in 4.0.x ?

  $ /usr/bin/ld -v
  @(#)PROGRAM:ld  PROJECT:ld64-123.2
  llvm version 2.9svn, from Apple Clang 2.0 (build 138)

  Comparing the MachO load commands vs a (working) tool executable that
  was created by Xcode 3.2.x, it appears that the new linker has partially
  ignored the build system's request to place the tool executable's stack
  at a non standard location.  The build system tells the linker
  "-stack_addr 0x134000000 -stack_size 0x800000".

  With the Xcode 3.2 linker those flags produce two results:

  (1) A load command to allocate the stack at the said location:
         Load command 3
               cmd LC_SEGMENT_64
           cmdsize 72
           segname __UNIXSTACK
            vmaddr 0x0000000133800000
            vmsize 0x0000000000800000
           fileoff 2285568
          filesize 0
           maxprot 0x00000007
          initprot 0x00000003
            nsects 0
             flags 0x0

  (2) A request (in LC_UNIXTHREAD) to set %rsp to the correct value
      at process startup, 0x134000000.

  With Xcode 4.0.1, (1) is missing but (2) is still present.  The
  tool executable therefore starts up with %rsp pointing to unmapped
  memory and faults almost instantly.


* Xcode 4.0.1 linking a 32 bit tool executable does not omit (1),
  and so works correctly.
Comment 7 Mike McQuaid 2011-03-31 14:24:21 UTC
I also see the same situation with it only affecting 64-bit binaries.
Comment 8 Julian Seward 2011-03-31 14:43:25 UTC
One really sick workaround is to observe that the executables contain
a redundant MachO load command:

Load command 2
      cmd LC_SEGMENT_64
  cmdsize 72
  segname __LINKEDIT
   vmaddr 0x0000000138dea000
   vmsize 0x00000000000ad000
  fileoff 2658304
 filesize 705632
  maxprot 0x00000007
 initprot 0x00000001
   nsects 0
    flags 0x0

The described section presumably contains information intended for the
dynamic linker, but is irrelevant because this is a statically linked
executable.  Hence it might be possible to postprocess the executables
after linking, to overwrite this entry with the information that would
have been in the missing __UNIXSTACK entry.  I tried this by hand
(with a binary editor) earlier and got something that worked.
Comment 9 Julian Seward 2011-03-31 19:45:48 UTC
Created attachment 58477 [details]
program that does the transformation listed in comment #8

Here's a program that does the transformation listed in comment 8.
Using it I can transform my segfaulting tool executables (eg,
memcheck-amd64-darwin) into ones that work properly.

I would be interested to hear whether it works for other people.

WARNING: this program will silently and irreversibly modify 64-bit
Mach-O executables, in a way that will cause (ordinary ones) to no
longer work.  Do not use it unless you understand the discussion
above.

Program is pretty rough, magic values are hardwired, error checking
is inadequate, etc, but it seems to work.  It will refuse to modify
a 32 bit executable on the basis that the xcode 4.0.1 linker doesn't
have problems with them (so it's not necessary).

How to use (eg):



$ ./vg-in-place date
./vg-in-place: line 31: 82073 Segmentation fault      VALGRIND_LIB="$vgbasedir/.in_place" VALGRIND_LIB_INNER="$vgbasedir/.in_place" "$vgbasedir/coregrind/valgrind" "$@"


$ gcc -m64 -Wall -g -O -o fixup_macho_loadcmds fixup_macho_loadcmds.c


$ ./fixup_macho_loadcmds ./memcheck/memcheck-amd64-darwin
size 3580824 fd 3
load cmd: offset   32   size 392   kind 25 = LC_SEGMENT_64
load cmd: offset  424   size 472   kind 25 = LC_SEGMENT_64
load cmd: offset  896   size  72   kind 25 = LC_SEGMENT_64
modification begins
modification done
load cmd: offset  968   size  24   kind  2 = LC_SYMTAB
load cmd: offset  992   size  24   kind 27 = LC_UUID
load cmd: offset 1016   size 184   kind  5 = LC_UNIXTHREAD
UnixThread: flavor 4 = x86_THREAD_STATE64
rsp = 0x134000000


$ ./vg-in-place date
==82086== Memcheck, a memory error detector
[... it at least starts up without dying ...]



Note that this example is for modifying the tool executables in the
build tree.  Normally you'd want to modify them in the installation
tree, eg, $prefix/lib/valgrind/memcheck-amd64-darwin, etc.
Comment 10 Greg Parker 2011-03-31 19:55:56 UTC
I can't reproduce this with a test app. What exactly is the compile or link line for an executable that gets the bad stack?
Comment 11 Julian Seward 2011-03-31 20:08:55 UTC
(In reply to comment #10)
Standard co & build of valgrind-trunk on 10.6.x w/ xcode-4.0.1

Then:


cd none


/usr/bin/ld -static -arch x86_64 -macosx_version_min 10.5 -o none-amd64-darwin -u __start -e __start -image_base 0x138000000 -stack_addr 0x134000000 -stack_size 0x800000 none_amd64_darwin-nl_main.o ../coregrind/libcoregrind-amd64-darwin.a ../VEX/libvex-amd64-darwin.a


gdb ./none-amd64-darwin
(gdb) run
Starting program: /Users/macuser/VgTRUNK/trunk/none/none-amd64-darwin 
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000134000000
_start_in_C_darwin (pArgc=0x134000000) at m_main.c:3107
3107	   Int     argc = *(Int *)pArgc;  // not pArgc[0] on LP64
(gdb) quit


../fixup_macho_loadcmds ./none-amd64-darwin


gdb ./none-amd64-darwin
(gdb) run
Starting program: /Users/macuser/VgTRUNK/trunk/none/none-amd64-darwin 
valgrind: You cannot run '/Users/macuser/VgTRUNK/trunk/none/none-amd64-darwin' directly.
valgrind: You should use $prefix/bin/valgrind.
Comment 12 Greg Parker 2011-03-31 21:02:44 UTC
Reproduced. Looks like it fails in the presence of -static.

A cleaner workaround would be to use -sectcreate to insert the stack data, rather than hijacking the __LINKEDIT.
Comment 13 Julian Seward 2011-04-01 18:25:41 UTC
(In reply to comment #12)
> Reproduced. Looks like it fails in the presence of -static.

Thanks for the confirmation.

Can this get fixed for xcode 4.0.2 ?  If so, is there some kind of
bug number or tag that we can track it via?  As per discussion above
I can work around it in the build system for the time being, but
it's not a good permanent solution.
Comment 14 Greg Parker 2011-04-01 19:39:10 UTC
I filed rdar://9216420. I don't know what the fix schedule will be.
Comment 15 Julian Seward 2011-04-04 22:08:25 UTC
Created attachment 58577 [details]
proposed patch

Here's a complete proposed patch.  It's against the SVN trunk but will
probably apply and work for 3.6.x as well.

After applying it, you will need to rebuild Valgrind from distclean
(iow, make distclean ; ./autogen.sh ; then configure and build as
normal.)  The build system then automatically post-processes the tool
executables as discussed above, so they should Just Work (tm).

This works for me for OSX 10.6.x using Xcode 4.0.1.  I would
appreciate people testing the following two combinations

  OSX 10.6.x,  Xcode 3.2.x   (to check it doesn't break w/ the old
                             Xcode)

  OSX 10.5.x,  Xcode 3.2.x   (to check it doesn't break Leopard)

since I don't want to check in something that breaks older setups, but
I can't check either of those easily myself.
Comment 16 Julian Seward 2011-04-06 13:13:42 UTC
Committed, r11686.  I am assuming it does not cause breakage on
for the untested combinations listed in comment #15.  If it does,
please re-open.
Comment 17 Mike McQuaid 2011-04-06 13:22:32 UTC
I'll assume the same and add it to Homebrew. If it makes things break, I'll be sure to push the issues upstream.
Comment 18 Mike McQuaid 2011-04-06 14:19:07 UTC
Fails to build against 3.6.1 with:
ranlib: file: libcoregrind-amd64-darwin.a(libcoregrind_amd64_darwin_a-elf.o) has no symbols
"my" variable $r masks earlier declaration in same scope at ../coregrind/link_tool_exe_darwin line 181.
Can't exec "../coregrind/fixup_macho_loadcmds": No such file or directory at ../coregrind/link_tool_exe_darwin line 181.
make[3]: *** [memcheck-amd64-darwin] Error 1
make[2]: *** [install-recursive] Error 1
make[1]: *** [install-recursive] Error 1
make: *** [install] Error 2
Comment 19 Mike McQuaid 2011-04-06 14:20:16 UTC
And non-parallel version:

link_tool_exe_darwin: /usr/bin/ld -static -arch x86_64 -macosx_version_min 10.5 -o memcheck-amd64-darwin -u __start -e __start -image_base 0x138000000 -stack_addr 0x134000000 -stack_size 0x800000 memcheck_amd64_darwin-mc_leakcheck.o memcheck_amd64_darwin-mc_malloc_wrappers.o memcheck_amd64_darwin-mc_main.o memcheck_amd64_darwin-mc_translate.o memcheck_amd64_darwin-mc_machine.o memcheck_amd64_darwin-mc_errors.o ../coregrind/libcoregrind-amd64-darwin.a ../VEX/libvex-amd64-darwin.a
link_tool_exe_darwin: ../coregrind/fixup_macho_loadcmds 0x134000000 0x800000 memcheck-amd64-darwin
Can't exec "../coregrind/fixup_macho_loadcmds": No such file or directory at ../coregrind/link_tool_exe_darwin line 181.
Comment 20 Julian Seward 2011-04-06 14:26:48 UTC
Comment #18 and #19: are these from-distclean builds?  You need to
make distclean, since the change updates the Makefile.am's.
Comment 21 Mike McQuaid 2011-04-06 14:28:17 UTC
These are from clean builds from applying that revision's patch to the 3.6.1 tarball and doing configure;make; make install. There's no autogen.sh in the release tarballs, should I run autoreconf or something instead? Sorry for failing :(
Comment 22 Julian Seward 2011-04-07 12:41:26 UTC
*** Bug 270309 has been marked as a duplicate of this bug. ***
Comment 23 Julian Seward 2011-04-20 12:59:04 UTC
*** Bug 271337 has been marked as a duplicate of this bug. ***
Comment 24 Julian Seward 2011-04-22 09:37:53 UTC
*** Bug 267342 has been marked as a duplicate of this bug. ***
Comment 25 Sylvain FAY-CHATELARD 2011-04-22 09:39:23 UTC
Nice! Thanks a lot!


Le 22 avr. 2011 à 09:46, Julian Seward <jseward@acm.org> a écrit :

> https://bugs.kde.org/show_bug.cgi?id=267997
>
>
> Julian Seward <jseward@acm.org> changed:
>
>           What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                 CC|                            |mldgodard@gmail.com
>
>
>
>
> --- Comment #24 from Julian Seward <jseward acm org>  2011-04-22 09:37:53 ---
> *** Bug 267342 has been marked as a duplicate of this bug. ***
>
> --
> Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug.
Comment 26 Julian Seward 2011-06-06 17:31:53 UTC
*** Bug 274784 has been marked as a duplicate of this bug. ***
Comment 27 Julian Seward 2011-06-10 21:58:43 UTC
*** Bug 267769 has been marked as a duplicate of this bug. ***
Comment 28 Julian Seward 2011-06-10 23:52:27 UTC
*** Bug 268792 has been marked as a duplicate of this bug. ***
Comment 29 Julian Seward 2011-06-11 01:06:14 UTC
*** Bug 270311 has been marked as a duplicate of this bug. ***
Comment 30 Julian Seward 2011-10-07 09:07:33 UTC
*** Bug 283325 has been marked as a duplicate of this bug. ***
Comment 31 Julian Seward 2011-10-12 16:09:53 UTC
*** Bug 276637 has been marked as a duplicate of this bug. ***