Summary: | Make Valgrind work for MacOSX 10.7 Lion | ||
---|---|---|---|
Product: | [Developer tools] valgrind | Reporter: | WSK <Wolf.St.Kappesser> |
Component: | general | Assignee: | Julian Seward <jseward> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | ajgilbert, benjamin, cwatson, DonaldEGrimes, glider, jackjost, othiman, oystein, peter, siegel, tim, wilane, Wolf.St.Kappesser |
Priority: | NOR | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Platform: | Compiled Sources | ||
OS: | macOS | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: | |||
Attachments: |
output of make
Adapted configure file Patch for configure.in Patch to implement 2 missing taskmsgs, a syscall and __pthread_sigmask WI{P patch, 27 Aug 2011 Programs using getaddrinfo() cause warning. Suppression required? WIP patch, vs current trunk revision 12025 debug log of valgrind memcheck on 10.7.1 |
Created attachment 60768 [details]
Adapted configure file
After hacking the config to accept darwin 11 I get the same build error. This is on the release version of 10.7, using i686-apple-darwin11-llvm-gcc-4.2 (i.e. the gcc that comes with Xcode 4.2). I was able to "fix" the build error by specifying --enable-only64bit. Adding '-lgcc -L/usr/lib/gcc/i686-apple-darwin11/4.2.1/' to the failing linker command also seems to work. I think this will need some interest from people more familiar with the valgrind build process. The built valgrind appears to work. I've got no idea how reliable it actually is at this point - presumably it will require a similar effort to the 10.5 > 10.6 work.. #205241 The XNU sources came out a few days ago http://opensource.apple.com/tarballs/xnu/xnu-1699.22.73.tar.gz I'm starting to come across unhandled syscalls - better that these are posted as separate bugs I assume. *** This bug has been confirmed by popular vote. *** > After editing the ./configure to run on Darwin 11.0.0 instead of 10.x.x and
> running "make" I get the attached error (second build without clean).
> [link error w.r.t. __fixunsdfdi for the 32-bit builds]
Fixed, r12000.
(In reply to comment #3) > I'm starting to come across unhandled syscalls - better that these are posted > as separate bugs I assume. Did any of these get posted? As a minimum, I see this even with the simplest programs (eg, /bin/date): --71249-- WARNING: unhandled syscall: unix:357 (In reply to comment #5) > > After editing the ./configure to run on Darwin 11.0.0 instead of 10.x.x and > > running "make" I get the attached error (second build without clean). > > [link error w.r.t. __fixunsdfdi for the 32-bit builds] > > Fixed, r12000. Thanks. (In reply to comment #6) > Did any of these get posted? As a minimum, I see this even with the > simplest programs (eg, /bin/date): > > --71249-- WARNING: unhandled syscall: unix:357 No, I got puzzled after reading the contents of coregrind/m_syswrap/priv_syswrap-darwin.h. 357, for example, is listed there as "wrapper not yet implemented in Valgrind" - suggesting this call isn't new to darwin11. Is it the case that this call just wasn't used very often in past versions of darwin? Is it appropriate to report syscalls missing even when priv_syswrap-darwin.h indicates they're known missing? (In reply to comment #7) > > --71249-- WARNING: unhandled syscall: unix:357 > > No, I got puzzled after reading the contents of > coregrind/m_syswrap/priv_syswrap-darwin.h. 357, for example, is listed there as > "wrapper not yet implemented in Valgrind" - suggesting this call isn't new to > darwin11. Is it the case that this call just wasn't used very often in past > versions of darwin? Either it was never used, or it was used and Valgrind complained, but nobody reported this, or at least it never got fixed. > Is it appropriate to report syscalls missing even when > priv_syswrap-darwin.h indicates they're known missing? Yes. It's appropriate to report them whenever you get the "WARNING: unhandled syscall" message. I fixed this just now, but haven't committed the fix yet. Am looking at some Memcheck-related issues at the moment. Created attachment 63118 [details]
Patch for configure.in
I patched configure.in and everything seem to build fine in 64 bits for me. Next fix those warnings I guess :) Created attachment 63181 [details]
Patch to implement 2 missing taskmsgs, a syscall and __pthread_sigmask
This patch implements syscall 357: getaudit_addr(), taskmsg 3414: task_get_exception_ports(), taskmsg 3229: mach_port_set_context(), and extends __pthread_sigmask().
With the last patch valgrind seems to work for simple programs. However there are many false-positive messages. All ncurses-based programs cause troubles (including a simple /bin/ls), programs using pthreads mark many memory segments allocated with calloc() as "uninitialized" when they are used. Don't know whether these two problems are related. Created attachment 63186 [details]
WI{P patch, 27 Aug 2011
Here's my current work-in-progress patch. It's not pretty, it only
works for 64 bit processes, and the resulting tree won't work on any
other platform. However, it does work well enough to run Firefox on
Memcheck, that is to say, you can run at least one complex threaded
application on it.
It needs to be applied to Valgrind trunk of a few minutes ago, that is
to say, valgrind >= r12003 and vex >= r2197. There's some debug
printing to do with wqthread_hijack that you'll probably want to
comment out.
*** Bug 275165 has been marked as a duplicate of this bug. *** *** Bug 281241 has been marked as a duplicate of this bug. *** (In reply to comment #13) > Created an attachment (id=63186) [details] > WIP patch, 27 Aug 2011 Feedback on this patch is welcomed. AFAIK it makes 64-bit 10.7 support work at least approximately as well as it does on 10.6. 32-bit is still broken. Created attachment 63360 [details]
Programs using getaddrinfo() cause warning. Suppression required?
Programs using getaddrinfo() expose leaks when using the reference code from the manpage (added as attachment). If these are not false alerts, they maybe have to be suppressed.
I am not sure whether this is a problem in valgrind or the tested code, but calling pthread_mutex_destroy() on when maybe some other thread is waiting causes this warning: --27446:0:schedule VG_(sema_down): read returned -4 valgrind on Linux won't print a warning, and my intuition says that it shall be allowed to destroy a semaphore when someone is waiting for it... however I cannot say for sure what's the correct behavior without reading the spec. clarification: the "read returned -4" message occurs right after calling pthread_cancel() for a running thread, some time before any mutex actually becomes destroyed (btw: -EINTR = -4, if that's the message, the warning is probably harmless, but annoying). Created attachment 63545 [details]
WIP patch, vs current trunk revision 12025
Revised patch for current svn trunk; no functional changes.
One thing I forgot to point out is that you need to build this
with gcc-4.2 on Xcode 4.1, not with plain "gcc". The latter will
appear to work, but the resulting Valgrind asserts in complex
threaded code, when worker queue threads exit (or something like that).
All you need to do is set CC=gcc-4.2 and do a from-distclean build.
Re comment 20, see comment 13 for expectations of what this patch can/can't do. Initial support for OSX 10.7 (an enhanced version of the comment 20 patch) was committed in r12043. Both 32- and 64-bit processes are now supported. You still need to set CC=gcc-4.2 before building, though. Feedback is welcomed, as these changes have only been lightly tested so far. Really great to see some 10.7 support I checked out revision 12043, configured and build valgrind (with CC=gcc-4.2) and had no problems running memcheck on most simple processes (such as `ls -l` or `cat` or even Firefox, but when I tried it with some other programs (that worked fine with valgrind on 10.6.8), the valgrind process would hang and not produce any output. Additionally, trying to kill the process in certain ways fails - `kill -9 <pid>` works, but most other signals fail. dtruss did not produce much meaningful output, other than the following: tjarratt:~ tjarratt$ ps aux | grep valgrind tjarratt 86342 88.7 3.6 2639824 151060 s000 R+ 11:39AM 0:03.75 valgrind --log-file=/tmp/testrunner.valgrind --error-exitcode=1 --suppressions=.suppressions --gen-suppressions=all --show-possibly-lost=yes ./testrunner tjarratt:~ tjarratt$ sudo dtruss -p 86342 SYSCALL(args) = return I'm not entirely sure where to start debugging this issue. The executable I'm trying to run memcheck against is a fairly simple C++ process - a testrunner that forks off some child processes to run some unit tests. I wouldn't be entirely surprised if this were related to my executable, but what other information can I gather that would illustrate the cause of this bug? Hello Tim, does 'valgrind --leak-check=full --show-reachable=yes ls' reports any leaks for you? Created attachment 63991 [details]
debug log of valgrind memcheck on 10.7.1
Built valgrind from trunk revision 12043, simple programs still show reachable blocks, more complicated programs hang.
(Sorry for the spam, I didn't realize that attaching a file would use the description field to post a new comment). Yes, running 'valgrind --leak-check=full --show-reachable=yes ls' does report a lot of leaks. Am I correct in assuming that the following steps are correct for running valgrind on 10.7 right now? export CC=gcc-4.2 build valgrind from trunk, revision 12043 as usual (autoconf, configure, make, make install) valgrind <options> <executable> (In reply to comment #23) > I'm not entirely sure where to start debugging this issue. The executable I'm > trying to run memcheck against is a fairly simple C++ process - a testrunner > that forks off some child processes to run some unit tests. From the look of the log file you attached, I'd guess that SIGCHLD is not getting back to the parent, or some such. So, the signal handling in 10.6 Valgrind support was pretty borked, so I am totally not surprised to hear it might be borked in 10.7 too. I think the most effective thing you can do is to cut this down into the smallest possible test case, and attach it here. You're of course welcome to chase this directly, but the signals and syscalls stuff is complex and, well, not a lot of fun. If you choose the latter route, it might be a good idea to compare the output of "--trace-syscalls=yes --trace-signals=yes" on 10.6 and 10.7, to see where things diverge. Oh .. and is this a 32- or 64-bit process? Thanks for responding so swiftly, Julian. It is indeed a 64-bit process. When I was reading through comments here earlier, I noticed that the original patch you were working on was for 64-bit only, so it's hopefully not that. Sounds like chasing this down directly is the way of madness, so I'll see if I can't find a small reproducible test case. Either way, I'll keep watching the progress on this bug, hoping it will improve :-D Status update as per current svn trunk (rev 12101): * some further fixes for 32 bit apps were committed, so it no longer asserts in wqthread_hijack for complex threaded apps using worker thread queues. * building with the default gcc in Xcode 4.1 is now supported, so you no longer need "CC=gcc-4.2" at configure time (or at any other time). * handling of signals may or may not have improved; I don't know. * 32-bit complex threaded apps using worker thread queues sometimes appear to hang whilst a 64-bit build of them works ok. I don't know why this is. I'm inclined to close this now, since it at least minimally works on 10.7. Any followup problems should be filed as new bugs. *** Bug 281304 has been marked as a duplicate of this bug. *** *** Bug 281305 has been marked as a duplicate of this bug. *** |
Created attachment 60767 [details] output of make Version: unspecified OS: OS X After editing the ./configure to run on Darwin 11.0.0 instead of 10.x.x and running "make" I get the attached error (second build without clean). Reproducible: Always Steps to Reproduce: edit ./configure to something like the attachment ./configure make Actual Results: see make.log Expected Results: clean build Using OS X 10.7 "Lion" newest developer-preview.