Bug 275168

Summary: Make Valgrind work for MacOSX 10.7 Lion
Product: [Developer tools] valgrind Reporter: WSK <Wolf.St.Kappesser>
Component: generalAssignee: Julian Seward <jseward>
Status: RESOLVED FIXED    
Severity: normal CC: ajgilbert, benjamin, cwatson, DonaldEGrimes, glider, jackjost, othiman, oystein, peter, siegel, tim, wilane, Wolf.St.Kappesser
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Compiled Sources   
OS: macOS   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: output of make
Adapted configure file
Patch for configure.in
Patch to implement 2 missing taskmsgs, a syscall and __pthread_sigmask
WI{P patch, 27 Aug 2011
Programs using getaddrinfo() cause warning. Suppression required?
WIP patch, vs current trunk revision 12025
debug log of valgrind memcheck on 10.7.1

Description WSK 2011-06-07 23:50:06 UTC
Created attachment 60767 [details]
output of make

Version:           unspecified
OS:                OS X

After editing the ./configure to run on Darwin 11.0.0 instead of 10.x.x and running "make" I get the attached error (second build without clean).

Reproducible: Always

Steps to Reproduce:
edit ./configure to something like the attachment
./configure
make


Actual Results:  
see make.log

Expected Results:  
clean build

Using OS X 10.7 "Lion" newest developer-preview.
Comment 1 WSK 2011-06-07 23:51:44 UTC
Created attachment 60768 [details]
Adapted configure file
Comment 2 Peter Le Bek 2011-07-24 16:08:11 UTC
After hacking the config to accept darwin 11 I get the same build error. This is on the release version of 10.7, using i686-apple-darwin11-llvm-gcc-4.2 (i.e. the gcc that comes with Xcode 4.2).

I was able to "fix" the build error by specifying --enable-only64bit. Adding '-lgcc -L/usr/lib/gcc/i686-apple-darwin11/4.2.1/' to the failing linker command also seems to work.

I think this will need some interest from people more familiar with the valgrind build process.

The built valgrind appears to work. I've got no idea how reliable it actually is at this point - presumably it will require a similar effort to the 10.5 > 10.6 work.. #205241
Comment 3 Peter Le Bek 2011-07-25 10:01:31 UTC
The XNU sources came out a few days ago http://opensource.apple.com/tarballs/xnu/xnu-1699.22.73.tar.gz

I'm starting to come across unhandled syscalls - better that these are posted as separate bugs I assume.
Comment 4 justincase 2011-08-05 04:23:11 UTC
*** This bug has been confirmed by popular vote. ***
Comment 5 Julian Seward 2011-08-23 07:41:47 UTC
> After editing the ./configure to run on Darwin 11.0.0 instead of 10.x.x and
> running "make" I get the attached error (second build without clean).
> [link error w.r.t. __fixunsdfdi for the 32-bit builds]

Fixed, r12000.
Comment 6 Julian Seward 2011-08-23 07:43:12 UTC
(In reply to comment #3)
> I'm starting to come across unhandled syscalls - better that these are posted
> as separate bugs I assume.

Did any of these get posted?  As a minimum, I see this even with the
simplest programs (eg, /bin/date):

--71249-- WARNING: unhandled syscall: unix:357
Comment 7 Peter Le Bek 2011-08-23 08:32:06 UTC
(In reply to comment #5)
> > After editing the ./configure to run on Darwin 11.0.0 instead of 10.x.x and
> > running "make" I get the attached error (second build without clean).
> > [link error w.r.t. __fixunsdfdi for the 32-bit builds]
> 
> Fixed, r12000.

Thanks.

(In reply to comment #6)
> Did any of these get posted?  As a minimum, I see this even with the
> simplest programs (eg, /bin/date):
> 
> --71249-- WARNING: unhandled syscall: unix:357

No, I got puzzled after reading the contents of coregrind/m_syswrap/priv_syswrap-darwin.h. 357, for example, is listed there as "wrapper not yet implemented in Valgrind" - suggesting this call isn't new to darwin11. Is it the case that this call just wasn't used very often in past versions of darwin? Is it appropriate to report syscalls missing even when priv_syswrap-darwin.h indicates they're known missing?
Comment 8 Julian Seward 2011-08-23 08:44:01 UTC
(In reply to comment #7)
> > --71249-- WARNING: unhandled syscall: unix:357
> 
> No, I got puzzled after reading the contents of
> coregrind/m_syswrap/priv_syswrap-darwin.h. 357, for example, is listed there as
> "wrapper not yet implemented in Valgrind" - suggesting this call isn't new to
> darwin11. Is it the case that this call just wasn't used very often in past
> versions of darwin?

Either it was never used, or it was used and Valgrind complained, but
nobody reported this, or at least it never got fixed.

> Is it appropriate to report syscalls missing even when
> priv_syswrap-darwin.h indicates they're known missing?

Yes.  It's appropriate to report them whenever you get the "WARNING:
unhandled syscall" message.

I fixed this just now, but haven't committed the fix yet.  Am looking
at some Memcheck-related issues at the moment.
Comment 9 Benjamin Poulain 2011-08-24 15:47:18 UTC
Created attachment 63118 [details]
Patch for configure.in
Comment 10 Benjamin Poulain 2011-08-24 15:50:28 UTC
I patched configure.in and everything seem to build fine in 64 bits for me.
Next fix those warnings I guess :)
Comment 11 Jack Jost 2011-08-27 19:57:46 UTC
Created attachment 63181 [details]
Patch to implement 2 missing taskmsgs, a syscall and __pthread_sigmask

This patch implements syscall 357: getaudit_addr(), taskmsg 3414: task_get_exception_ports(), taskmsg 3229: mach_port_set_context(), and extends __pthread_sigmask().
Comment 12 Jack Jost 2011-08-27 20:00:34 UTC
With the last patch valgrind seems to work for simple programs. However there are many false-positive messages. All ncurses-based programs cause troubles (including a simple /bin/ls), programs using pthreads mark many memory segments allocated with calloc() as "uninitialized" when they are used. Don't know whether these two problems are related.
Comment 13 Julian Seward 2011-08-27 21:21:22 UTC
Created attachment 63186 [details]
WI{P patch, 27 Aug 2011

Here's my current work-in-progress patch.  It's not pretty, it only
works for 64 bit processes, and the resulting tree won't work on any
other platform.  However, it does work well enough to run Firefox on
Memcheck, that is to say, you can run at least one complex threaded
application on it.

It needs to be applied to Valgrind trunk of a few minutes ago, that is
to say, valgrind >= r12003 and vex >= r2197.  There's some debug
printing to do with wqthread_hijack that you'll probably want to
comment out.
Comment 14 Julian Seward 2011-09-03 12:46:21 UTC
*** Bug 275165 has been marked as a duplicate of this bug. ***
Comment 15 Julian Seward 2011-09-03 12:47:05 UTC
*** Bug 281241 has been marked as a duplicate of this bug. ***
Comment 16 Julian Seward 2011-09-03 12:49:48 UTC
(In reply to comment #13)
> Created an attachment (id=63186) [details]
> WIP patch, 27 Aug 2011

Feedback on this patch is welcomed.  AFAIK it makes 64-bit 10.7
support work at least approximately as well as it does on 10.6.
32-bit is still broken.
Comment 17 Jack Jost 2011-09-03 18:38:08 UTC
Created attachment 63360 [details]
Programs using getaddrinfo() cause warning. Suppression required?

Programs using getaddrinfo() expose leaks when using the reference code from the manpage (added as attachment). If these are not false alerts, they maybe have to be suppressed.
Comment 18 Jack Jost 2011-09-03 18:46:17 UTC
I am not sure whether this is a problem in valgrind or the tested code, but calling pthread_mutex_destroy() on when maybe some other thread is waiting causes this warning:

--27446:0:schedule VG_(sema_down): read returned -4

valgrind on Linux won't print a warning, and my intuition says that it shall be allowed to destroy a semaphore when someone is waiting for it... however I cannot say for sure what's the correct behavior without reading the spec.
Comment 19 Jack Jost 2011-09-03 19:38:56 UTC
clarification: the "read returned -4" message occurs right after calling pthread_cancel() for a running thread, some time before any mutex actually becomes destroyed (btw: -EINTR = -4, if that's the message, the warning is probably harmless, but annoying).
Comment 20 Julian Seward 2011-09-10 12:53:00 UTC
Created attachment 63545 [details]
WIP patch, vs current trunk revision 12025

Revised patch for current svn trunk; no functional changes.

One thing I forgot to point out is that you need to build this
with gcc-4.2 on Xcode 4.1, not with plain "gcc".  The latter will
appear to work, but the resulting Valgrind asserts in complex
threaded code, when worker queue threads exit (or something like that).

All you need to do is set CC=gcc-4.2 and do a from-distclean build.
Comment 21 Julian Seward 2011-09-10 12:55:11 UTC
Re comment 20, see comment 13 for expectations of what this patch
can/can't do.
Comment 22 Julian Seward 2011-09-21 08:52:12 UTC
Initial support for OSX 10.7 (an enhanced version of the comment 20
patch) was committed in r12043.  Both 32- and 64-bit processes are
now supported.  You still need to set CC=gcc-4.2 before building,
though.  Feedback is welcomed, as these changes have only been lightly
tested so far.
Comment 23 Tim Jarratt 2011-09-26 18:52:12 UTC
Really great to see some 10.7 support 

I checked out revision 12043, configured and build valgrind (with CC=gcc-4.2) and had no problems running memcheck on most simple processes (such as `ls -l` or `cat` or even Firefox, but when I tried it with some other programs (that worked fine with valgrind on 10.6.8), the valgrind process would hang and not produce any output. Additionally, trying to kill the process in certain ways fails - `kill -9 <pid>` works, but most other signals fail.

dtruss did not produce much meaningful output, other than the following:

tjarratt:~ tjarratt$ ps aux | grep valgrind
tjarratt       86342  88.7  3.6  2639824 151060 s000  R+   11:39AM   0:03.75 valgrind --log-file=/tmp/testrunner.valgrind --error-exitcode=1 --suppressions=.suppressions --gen-suppressions=all --show-possibly-lost=yes ./testrunner
tjarratt:~ tjarratt$ sudo dtruss -p 86342
SYSCALL(args) 		 = return

I'm not entirely sure where to start debugging this issue. The executable I'm trying to run memcheck against is a fairly simple C++ process - a testrunner that forks off some child processes to run some unit tests. I wouldn't be entirely surprised if this were related to my executable, but what other information can I gather that would illustrate the cause of this bug?
Comment 24 Jack Jost 2011-09-26 19:38:17 UTC
Hello Tim,
does 'valgrind --leak-check=full --show-reachable=yes ls' reports any leaks for you?
Comment 25 Tim Jarratt 2011-09-26 20:07:24 UTC
Created attachment 63991 [details]
debug log of valgrind memcheck on 10.7.1

Built valgrind from trunk revision 12043, simple programs still show reachable blocks, more complicated programs hang.
Comment 26 Tim Jarratt 2011-09-26 20:12:10 UTC
(Sorry for the spam, I didn't realize that attaching a file would use the description field to post a new comment). 

Yes, running 'valgrind --leak-check=full --show-reachable=yes ls' does report a lot of leaks. Am I correct in assuming that the following steps are correct for running valgrind on 10.7 right now?

export CC=gcc-4.2
build valgrind from trunk, revision 12043 as usual (autoconf, configure, make, make install)
valgrind <options> <executable>
Comment 27 Julian Seward 2011-09-26 22:22:34 UTC
(In reply to comment #23)
> I'm not entirely sure where to start debugging this issue. The executable I'm
> trying to run memcheck against is a fairly simple C++ process - a testrunner
> that forks off some child processes to run some unit tests.

From the look of the log file you attached, I'd guess that SIGCHLD is not 
getting back to the parent, or some such.  So, the signal handling in 10.6
Valgrind support was pretty borked, so I am totally not surprised to hear
it might be borked in 10.7 too.

I think the most effective thing you can do is to cut this down into the
smallest possible test case, and attach it here.  You're of course welcome
to chase this directly, but the signals and syscalls stuff is complex and,
well, not a lot of fun.  If you choose the latter route, it might be a good
idea to compare the output of "--trace-syscalls=yes --trace-signals=yes"
on 10.6 and 10.7, to see where things diverge.

Oh .. and is this a 32- or 64-bit process?
Comment 28 Tim Jarratt 2011-09-26 22:29:57 UTC
Thanks for responding so swiftly, Julian.

It is indeed a 64-bit process. When I was reading through comments here earlier, I noticed that the original patch you were working on was for 64-bit only, so it's hopefully not that.

Sounds like chasing this down directly is the way of madness, so I'll see if I can't find a small reproducible test case. Either way, I'll keep watching the progress on this bug, hoping it will improve :-D
Comment 29 Julian Seward 2011-10-05 08:15:52 UTC
Status update as per current svn trunk (rev 12101):

* some further fixes for 32 bit apps were committed, so it no longer
  asserts in wqthread_hijack for complex threaded apps using worker
  thread queues.

* building with the default gcc in Xcode 4.1 is now supported, so you
  no longer need "CC=gcc-4.2" at configure time (or at any other time).

* handling of signals may or may not have improved; I don't know.

* 32-bit complex threaded apps using worker thread queues sometimes
  appear to hang whilst a 64-bit build of them works ok.  I don't
  know why this is.
Comment 30 Julian Seward 2011-10-12 11:00:17 UTC
I'm inclined to close this now, since it at least minimally works
on 10.7.  Any followup problems should be filed as new bugs.
Comment 31 Julian Seward 2011-10-13 09:05:36 UTC
*** Bug 281304 has been marked as a duplicate of this bug. ***
Comment 32 Julian Seward 2011-10-13 09:07:06 UTC
*** Bug 281305 has been marked as a duplicate of this bug. ***