I sent this to the mailing list. It was reviewed by John Reiser (thanks John!) and he asked that I file this report. Here are the details I supplied: Running callgrind on my Qt-based app on macOS 10.12.6 crashes. Running callgrind on "/bin/date" works (which is not surprising as I think workq_ops is related to threads?) - in Qt Creator, add a new project - select "Qt Console Application" - edit its qmake file to remove "CONFIG += console" (this shouldn't be added on the Mac) - build "Profile" version The .pro looks like this: QT += core QT -= gui CONFIG += c++11 TARGET = valgrind-test2 CONFIG -= app_bundle TEMPLATE = app SOURCES += main.cpp And main.cpp looks like this: #include <QCoreApplication> int main(int argc, char *argv[]) { QCoreApplication a(argc, argv); return a.exec(); } Run valgrind --tool=callgrind <path-to-command-line-executable> Results: ==35785== Callgrind, a call-graph generating cache profiler ==35785== Copyright (C) 2002-2017, and GNU GPL'd, by Josef Weidendorfer et al. ==35785== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info ==35785== Command: /Users/maloney/dev/build-valgrind-test2-Qt_5_x-Profile/valgrind-test2 ==35785== ==35785== For interactive control, run 'callgrind_control -h'. --35785-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option --35785-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2 times) --35785-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4 times) UNKNOWN workq_ops option 128 ==35785== valgrind: Unrecognised instruction at address 0x103b0fb50. ==35785== at 0x103B0FB50: _dispatch_kq_init (in /usr/lib/system/libdispatch.dylib) ==35785== by 0x103B0D8FB: _dispatch_client_callout (in /usr/lib/system/libdispatch.dylib) ==35785== by 0x103B0D8B8: dispatch_once_f (in /usr/lib/system/libdispatch.dylib) ==35785== by 0x103B0FA90: _dispatch_kq_update (in /usr/lib/system/libdispatch.dylib) ==35785== by 0x103B110CD: _dispatch_kevent_resume (in /usr/lib/system/libdispatch.dylib) ==35785== by 0x103B1103D: _dispatch_source_kevent_resume (in /usr/lib/system/libdispatch.dylib) ==35785== by 0x103B10E85: _dispatch_source_kevent_register (in /usr/lib/system/libdispatch.dylib) ==35785== by 0x103B20651: _dispatch_queue_resume_finalize_activation (in /usr/lib/system/libdispatch.dylib) ==35785== by 0x103E603C0: _notify_lib_init (in /usr/lib/system/libsystem_notify.dylib) ==35785== by 0x103E609AB: notify_register_dispatch (in /usr/lib/system/libsystem_notify.dylib) ==35785== by 0x1027E8916: CFUniCharMapTo (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation) ==35785== by 0x103B0D8FB: _dispatch_client_callout (in /usr/lib/system/libdispatch.dylib) ==35785== Your program just tried to execute an instruction that Valgrind ==35785== did not recognise. There are two possible reasons for this. ==35785== 1. Your program has a bug and erroneously jumped to a non-code ==35785== location. If you are running Memcheck and you just saw a ==35785== warning about a bad jump, it's probably your program's fault. ==35785== 2. The instruction is legitimate but Valgrind doesn't handle it, ==35785== i.e. it's Valgrind's fault. If you think this is the case or ==35785== you are not sure, please let us know and we'll try to fix it. ==35785== Either way, Valgrind will now raise a SIGILL signal which will ==35785== probably kill your program. ==35785== ==35785== Process terminating with default action of signal 4 (SIGILL) ==35785== Illegal opcode at address 0x103B0FB50 ==35785== at 0x103B0FB50: _dispatch_kq_init (in /usr/lib/system/libdispatch.dylib) ==35785== by 0x103B0D8FB: _dispatch_client_callout (in /usr/lib/system/libdispatch.dylib) ==35785== by 0x103B0D8B8: dispatch_once_f (in /usr/lib/system/libdispatch.dylib) ==35785== by 0x103B0FA90: _dispatch_kq_update (in /usr/lib/system/libdispatch.dylib) ==35785== by 0x103B110CD: _dispatch_kevent_resume (in /usr/lib/system/libdispatch.dylib) ==35785== by 0x103B1103D: _dispatch_source_kevent_resume (in /usr/lib/system/libdispatch.dylib) ==35785== by 0x103B10E85: _dispatch_source_kevent_register (in /usr/lib/system/libdispatch.dylib) ==35785== by 0x103B20651: _dispatch_queue_resume_finalize_activation (in /usr/lib/system/libdispatch.dylib) ==35785== by 0x103E603C0: _notify_lib_init (in /usr/lib/system/libsystem_notify.dylib) ==35785== by 0x103E609AB: notify_register_dispatch (in /usr/lib/system/libsystem_notify.dylib) ==35785== by 0x1027E8916: CFUniCharMapTo (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation) ==35785== by 0x103B0D8FB: _dispatch_client_callout (in /usr/lib/system/libdispatch.dylib) ==35785== ==35785== Events : Ir ==35785== Collected : 188406082 ==35785== ==35785== I refs: 188,406,082 Illegal instruction: 4 John's response and analysis: I was able to reproduce the problem using --tool=none, so it is not specific to memcheck, callgrind, etc. I am running MacOS Sierra Version 10.12.6. The code in system library libdispatch.dylib expects there to be a trap handler for opcode 'ud2' (0f 0b) [generates SIGILL] which the valgrind emulator has disabled through some means, perhaps unknowing or inadvertent. [Or, perhaps some even-more-global protocol (that would have avoided the 'ud2') has been violated.] ===== $ valgrind --tool=none ~jreiser/build-valgrind_test2-Desktop_Qt_5_9_1_clang_64bit-Profile/valgrind_test2 ==43499== Nulgrind, the minimal Valgrind tool ==43499== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote. ==43499== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info ==43499== Command: /Users/jreiser/build-valgrind_test2-Desktop_Qt_5_9_1_clang_64bit-Profile/valgrind_test2 ==43499== --43499-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option --43499-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2 times) --43499-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4 times) UNKNOWN workq_ops option 128 ==43499== valgrind: Unrecognised instruction at address 0x103b1fb50. ==43499== at 0x103B1FB50: _dispatch_kq_init (in /usr/lib/system/libdispatch.dylib) ==43499== by 0x103B1D8FB: _dispatch_client_callout (in /usr/lib/system/libdispatch.dylib) [[snip]] ===== Running valgrind under lldb, and disassembling after the SIGILL: ===== (lldb) x/12i 0x103b1fb1f 0x103b1fb1f: e8 2e 48 02 00 callq 0x103b44352 0x103b1fb24: 83 f8 ff cmpl $-0x1, %eax 0x103b1fb27: 0f 85 a1 00 00 00 jne 0x103b1fbce 0x103b1fb2d: e8 e8 46 02 00 callq 0x103b4421a 0x103b1fb32: 48 63 00 movslq (%rax), %rax 0x103b1fb35: 48 83 f8 04 cmpq $0x4, %rax 0x103b1fb39: 74 bf je 0x103b1fafa 0x103b1fb3b: 48 8d 0d dd 71 02 00 leaq 0x271dd(%rip), %rcx 0x103b1fb42: 48 89 0d f7 cc 04 00 movq %rcx, 0x4ccf7(%rip) 0x103b1fb49: 48 89 05 20 cd 04 00 movq %rax, 0x4cd20(%rip) => 0x103b1fb50: 0f 0b ud2 0x103b1fb52: f6 03 01 testb $0x1, (%rbx) ===== Obviously %rax and %rcx (and/or 64-bit memory locations (0x4ccf7+0x103b1fb49) and (0x4cd20+0x103b1fb50)) contain two parameters to some subroutine that is invoked by the signal handler for the 'ud2' opcode (which generates SIGILL or its MacOS equivalent). So perhaps valgrind should restore the original signal handler for SIGILL during the single instruction 'ud2'; or, libdispatch.dylib may be assuming some other protocol that valgrind does not know about, etc. Details: I had only XCode already installed. It took a couple hours to download and install the free version of QtCreator (default version 5.9.1), then install MacPorts and homebrew (following https://paolozaino.wordpress.com/2015/05/05/how-to-install-and-use-autotools-on-mac-os-x/ which aroused suspicion because the most recent update was a couple years old) so that I could run autogen.sh to build valgrind from current git source. But I did manage to reproduce the problem, so enough of everything probably worked.
/bin/date gets some of the "prelude" notices under --tool=none, so investigating that could be a warm-up for working on the "UNKNOWN workq_ops option 128". $ valgrind --tool=none /bin/date ==44124== Nulgrind, the minimal Valgrind tool ==44124== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote. ==44124== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info ==44124== Command: /bin/date ==44124== --44124-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option --44124-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2 times) --44124-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4 times) Mon Aug 21 09:59:20 PDT 2017 ==44124==
Is there anything else I can provide to help with this? I'm afraid actually fixing it is beyond my capabilities. (It's a blocker for me - and anyone else trying to use valgrind for Qt-based apps on macOS it seems.) Thanks!
On 09/01/2017 09:18 AM, Andy wrote: > https://bugs.kde.org/show_bug.cgi?id=383723 > > --- Comment #2 from Andy <imol00+kde@gmail.com> --- > Is there anything else I can provide to help with this? I'm afraid actually > fixing it is beyond my capabilities. > > (It's a blocker for me - and anyone else trying to use valgrind for Qt-based > apps on macOS it seems.) > > Thanks! > Locate some good documentation on MacOS workq. Specifically, find the MacOS source code which handles all workq options, including those that correspond to the cases in PRE(workq_ops) in coregrind/m_syswrap/syswrap-darwin.c . The closest I could find after modest searching is https://opensource.apple.com/source/xnu/xnu-3789.51.2/bsd/kern/pthread_shims.c.auto.html Apparently there used to be a file pthread_synch.c but I cannot find it. I did find https://opensource.apple.com/source/libpthread/libpthread-137.1.1/kern/workqueue_internal.h which does have #define WQOPS_SET_EVENT_MANAGER_PRIORITY 0x80 /* max() in the provided priority in the the priority of the event manager */ and looks like a clue. If so, then option 128 could be a no-op for valgrind. Try that?
Great - thanks John! Those MACH_SEND_TRAILER warnings you mentioned earlier were reported a couple of years ago: https://bugs.kde.org/show_bug.cgi?id=343306 Like you I cannot find any documentation on Darwin's workq except some source code. I found the most recent (released) version of workqueue_internal.h - for macOS 10.12.4: https://opensource.apple.com/source/libpthread/libpthread-218.1.3/kern/workqueue_internal.h.auto.html and where the WQOPS_SET_EVENT_MANAGER_PRIORITY case is processed (see _workq_kernreturn): https://opensource.apple.com/source/libpthread/libpthread-218.1.3/kern/kern_support.c.auto.html and where it is called with this value (see _pthread_workqueue_set_event_manager_priority): https://opensource.apple.com/source/libpthread/libpthread-218.1.3/src/pthread.c.auto.html Based on my reading I think you are correct and it can be ignored for valgrind's purposes because it's for scheduling priorities. Assuming I did things correctly to test this (simply adding "case 128: break;" to PRE(workq_ops)), it now crashes with an "Unrecognised instruction" instead: ==57909== Callgrind, a call-graph generating cache profiler ==57909== Copyright (C) 2002-2017, and GNU GPL'd, by Josef Weidendorfer et al. ==57909== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info ==57909== Command: /Users/maloney/dev/build-test-valgrind/test-valgrind.app/Contents/MacOS/test-valgrind ==57909== ==57909== For interactive control, run 'callgrind_control -h'. --57909-- run: /usr/bin/dsymutil "/Users/maloney/dev/build-test-valgrind/test-valgrind.app/Contents/MacOS/test-valgrind" --57909-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option --57909-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2 times) --57909-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4 times) ==57909== valgrind: Unrecognised instruction at address 0x104018b50. ==57909== at 0x104018B50: _dispatch_kq_init (in /usr/lib/system/libdispatch.dylib) ==57909== by 0x1040168FB: _dispatch_client_callout (in /usr/lib/system/libdispatch.dylib) ==57909== by 0x1040168B8: dispatch_once_f (in /usr/lib/system/libdispatch.dylib) ==57909== by 0x104018A90: _dispatch_kq_update (in /usr/lib/system/libdispatch.dylib) ==57909== by 0x10401A0CD: _dispatch_kevent_resume (in /usr/lib/system/libdispatch.dylib) ==57909== by 0x10401A03D: _dispatch_source_kevent_resume (in /usr/lib/system/libdispatch.dylib) ==57909== by 0x104019E85: _dispatch_source_kevent_register (in /usr/lib/system/libdispatch.dylib) ==57909== by 0x104029651: _dispatch_queue_resume_finalize_activation (in /usr/lib/system/libdispatch.dylib) ==57909== by 0x105AE43C0: _notify_lib_init (in /usr/lib/system/libsystem_notify.dylib) ==57909== by 0x105AE49AB: notify_register_dispatch (in /usr/lib/system/libsystem_notify.dylib) ==57909== by 0x1049FE916: CFUniCharMapTo (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation) ==57909== by 0x1040168FB: _dispatch_client_callout (in /usr/lib/system/libdispatch.dylib) ==57909== Your program just tried to execute an instruction that Valgrind ==57909== did not recognise. There are two possible reasons for this. ==57909== 1. Your program has a bug and erroneously jumped to a non-code ==57909== location. If you are running Memcheck and you just saw a ==57909== warning about a bad jump, it's probably your program's fault. ==57909== 2. The instruction is legitimate but Valgrind doesn't handle it, ==57909== i.e. it's Valgrind's fault. If you think this is the case or ==57909== you are not sure, please let us know and we'll try to fix it. ==57909== Either way, Valgrind will now raise a SIGILL signal which will ==57909== probably kill your program. ==57909== ==57909== Process terminating with default action of signal 4 (SIGILL) ==57909== Illegal opcode at address 0x104018B50 ==57909== at 0x104018B50: _dispatch_kq_init (in /usr/lib/system/libdispatch.dylib) ==57909== by 0x1040168FB: _dispatch_client_callout (in /usr/lib/system/libdispatch.dylib) ==57909== by 0x1040168B8: dispatch_once_f (in /usr/lib/system/libdispatch.dylib) ==57909== by 0x104018A90: _dispatch_kq_update (in /usr/lib/system/libdispatch.dylib) ==57909== by 0x10401A0CD: _dispatch_kevent_resume (in /usr/lib/system/libdispatch.dylib) ==57909== by 0x10401A03D: _dispatch_source_kevent_resume (in /usr/lib/system/libdispatch.dylib) ==57909== by 0x104019E85: _dispatch_source_kevent_register (in /usr/lib/system/libdispatch.dylib) ==57909== by 0x104029651: _dispatch_queue_resume_finalize_activation (in /usr/lib/system/libdispatch.dylib) ==57909== by 0x105AE43C0: _notify_lib_init (in /usr/lib/system/libsystem_notify.dylib) ==57909== by 0x105AE49AB: notify_register_dispatch (in /usr/lib/system/libsystem_notify.dylib) ==57909== by 0x1049FE916: CFUniCharMapTo (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation) ==57909== by 0x1040168FB: _dispatch_client_callout (in /usr/lib/system/libdispatch.dylib) ==57909== ==57909== Events : Ir ==57909== Collected : 203711568 ==57909== ==57909== I refs: 203,711,568 The program has unexpectedly finished. ** Process crashed **
The crash looks very similar to the 'ud2' diagnosed in the original Description. In particular, the offset 0x....b50 is the same. This probably indicates that valgrind has more-or-less completely missed some aspect of what MacOS is doing. We need advice from an expert. <joke> So, spend two weeks at the bar/pub/tavern/restaurants in Cupertino and Sunnyvale. Buy a beer for the next 10 people who enter. Chat them up. </joke> Apply dtruss and/or dtrace to the original program without valgrind, and to valgrind when running the program. Correlate the system calls between the two runs; try to understand the difference. [Also investigate "valgrind --trace-syscalls=yes ..." as an additional or alternate source of information.] That's quite tedious, but logically should work. Also, contrast with running on Linux (which uses 'strace'). The Qt implementation might be similar enough to provide a clue.
Thanks John. I must be doing something wrong. I can run dtruss on my example ok ("sudo dtruss -e ./valgrind-test"). I can run valgrind on the example ("valgrind --tool=none ./valgrind-test") and it crashes (as expected). But when I run dtruss on valgrind on the example ("sudo dtruss -e valgrind --tool=none ./valgrind-test") it doesn't run the test executable. Here's the output: dtrace: system integrity protection is on, some features will not be available ELAPSD SYSCALL(args) = return 35 open("/dev/dtracehelper\0", 0x2, 0x7FFF58BBD930) = 3 0 346 ioctl(0x3, 0x80086804, 0x7FFF58BBD8B8) = 0 0 6 close(0x3) = 0 0 2 thread_selfid(0x3, 0x80086804, 0x7FFF58BBD8B8) = 2058768 0 4 bsdthread_register(0x7FFFE336D080, 0x7FFFE336D070, 0x2000) = 1073741919 0 2 ulock_wake(0x1, 0x7FFF58BBD0EC, 0x0) = -1 Err#2 1 issetugid(0x1, 0x7FFF58BBD0EC, 0x0) = 0 0 5 mprotect(0x107049000, 0x88, 0x1) = 0 0 2 mprotect(0x10704B000, 0x1000, 0x0) = 0 0 1 mprotect(0x107061000, 0x1000, 0x0) = 0 0 1 mprotect(0x107062000, 0x1000, 0x0) = 0 0 2 mprotect(0x107078000, 0x1000, 0x0) = 0 0 2 mprotect(0x107079000, 0x1000, 0x1) = 0 0 3 mprotect(0x107049000, 0x88, 0x3) = 0 0 2 mprotect(0x107049000, 0x88, 0x1) = 0 0 1 getpid(0x107049000, 0x88, 0x1) = 58961 0 4 stat64("/AppleInternal/XBS/.isChrooted\0", 0x7FFF58BBCFA8, 0x1) = -1 Err#2 1 stat64("/AppleInternal\0", 0x7FFF58BBD040, 0x1) = -1 Err#2 3 csops(0xE651, 0x7, 0x7FFF58BBCAD0) = -1 Err#22 dtrace: error on enabled probe ID 2158 (ID 552: syscall::sysctl:return): invalid kernel access in action #10 at DIF offset 40 1 ulock_wake(0x1, 0x7FFF58BBD050, 0x0) = -1 Err#2 2 csops(0xE651, 0x7, 0x7FFF58BBC3B0) = -1 Err#22 6 stat64("./valgrind-test\0", 0x7FFF58BBDB40, 0x7FFF58BBC3B0) = 0 0 24 access("/Users/maloney/dev/lib/valgrind/vgpreload_core-x86-darwin.so\0", 0x5, 0x7FFF58BBC3B0) = 0 0 15 access("/Users/maloney/dev/lib/valgrind/vgpreload_core-amd64-darwin.so\0", 0x5, 0x7FFF58BBC3B0) = 0 0 3 access("/Users/maloney/dev/lib/valgrind/vgpreload_core-arm-darwin.so\0", 0x5, 0x7FFF58BBC3B0) = -1 Err#2 2 access("/Users/maloney/dev/lib/valgrind/vgpreload_core-ppc32-darwin.so\0", 0x5, 0x7FFF58BBC3B0) = -1 Err#2 2 access("/Users/maloney/dev/lib/valgrind/vgpreload_core-ppc64be-darwin.so\0", 0x5, 0x7FFF58BBC3B0) = -1 Err#2 ==58961== Nulgrind, the minimal Valgrind tool ==58961== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote. 3 access("/Users/maloney/brew/bin/./valgrind-test\0", 0x5, 0x7FFF58BBC3B0) = -1 Err#2 ==58961== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info 2 access("/Users/maloney/research/bin/./valgrind-test\0", 0x5, 0x7FFF58BBC3B0) = -1 Err#2 ==58961== Command: ./valgrind-test ==58961== 2 access("/Users/maloney/dev/bin/./valgrind-test\0", 0x5, 0x7FFF58BBC3B0) = -1 Err#2 3 access("/usr/local/bin/./valgrind-test\0", 0x5, 0x7FFF58BBC3B0) = -1 Err#2 2 access("/usr/bin/./valgrind-test\0", 0x5, 0x7FFF58BBC3B0) = -1 Err#2 2 access("/bin/./valgrind-test\0", 0x5, 0x7FFF58BBC3B0) = -1 Err#2 2 access("/usr/sbin/./valgrind-test\0", 0x5, 0x7FFF58BBC3B0) = -1 Err#2 2 access("/sbin/./valgrind-test\0", 0x5, 0x7FFF58BBC3B0) = -1 Err#2 7 open("./valgrind-test\0", 0x0, 0xFFFFFFFFFFFFFFFF) = 3 0 dtrace: error on enabled probe ID 2134 (ID 154: syscall::read:return): invalid kernel access in action #12 at DIF offset 92 5 close(0x3) = 0 0 4 open_nocancel(".\0", 0x0, 0x7FB9AB800000) = 3 0 2 fstat64(0x3, 0x7FFF58BBD940, 0x7FB9AB800000) = 0 0 6 fcntl_nocancel(0x3, 0x32, 0x7FB9AB801000) = 0 0 2 close_nocancel(0x3) = 0 0 3 stat64("/Users/maloney/dev/build-valgrind-test-Qt_5_x-Profile\0", 0x7FFF58BBD8B0, 0x7FB9AB801000) = 0 0 1 getppid(0x7FB9AB801000, 0x7FFF58BBD8B0, 0x7FB9AB801000) = 58958 0 14 access("/Users/maloney/dev/lib/valgrind/none-amd64-darwin\0", 0x5, 0x7FB9AB801000) = 0 0 It looks like it's checking each path for the executable ("access"), then tries to open the executable and ... fails? I tried it with a full path and get the same result. Am I doing that correctly? (Apologies - never used dtruss before, so I'm relying on online info and experimentation.)
> dtrace: system integrity protection is on, some features will not be > available You must disable that system integrity protection (that's why the spawn/"execve"/whatever failed.) It's a minor hassle. Search the net, I don't remember the recipe.
Ok I did that but it's still giving me roughly the same thing: ELAPSD SYSCALL(args) = return 35 open("/dev/dtracehelper\0", 0x2, 0x7FFF5834D930) = 3 0 320 ioctl(0x3, 0x80086804, 0x7FFF5834D8B8) = 0 0 6 close(0x3) = 0 0 2 thread_selfid(0x3, 0x80086804, 0x7FFF5834D8B8) = 4487 0 4 bsdthread_register(0x7FFF9894B080, 0x7FFF9894B070, 0x2000) = 1073741919 0 2 ulock_wake(0x1, 0x7FFF5834D0EC, 0x0) = -1 Err#2 1 issetugid(0x1, 0x7FFF5834D0EC, 0x0) = 0 0 4 mprotect(0x1078B9000, 0x88, 0x1) = 0 0 2 mprotect(0x1078BB000, 0x1000, 0x0) = 0 0 1 mprotect(0x1078D1000, 0x1000, 0x0) = 0 0 1 mprotect(0x1078D2000, 0x1000, 0x0) = 0 0 1 mprotect(0x1078E8000, 0x1000, 0x0) = 0 0 2 mprotect(0x1078E9000, 0x1000, 0x1) = 0 0 3 mprotect(0x1078B9000, 0x88, 0x3) = 0 0 2 mprotect(0x1078B9000, 0x88, 0x1) = 0 0 1 getpid(0x1078B9000, 0x88, 0x1) = 440 0 4 stat64("/AppleInternal/XBS/.isChrooted\0", 0x7FFF5834CFA8, 0x1) = -1 Err#2 1 stat64("/AppleInternal\0", 0x7FFF5834D040, 0x1) = -1 Err#2 3 csops(0x1B8, 0x7, 0x7FFF5834CAD0) = -1 Err#22 33 sysctl([CTL_KERN, 14, 1, 440, 0, 0] (4), 0x7FFF5834CC28, 0x7FFF5834CC20, 0x0, 0x0) = 0 0 1 ulock_wake(0x1, 0x7FFF5834D050, 0x0) = -1 Err#2 2 csops(0x1B8, 0x7, 0x7FFF5834C3B0) = -1 Err#22 33 stat64("./valgrind-test\0", 0x7FFF5834DB40, 0x7FFF5834C3B0) = 0 0 35 access("/Users/maloney/dev/lib/valgrind/vgpreload_core-x86-darwin.so\0", 0x5, 0x7FFF5834C3B0) = 0 0 ==440== Nulgrind, the minimal Valgrind tool 25 access("/Users/maloney/dev/lib/valgrind/vgpreload_core-amd64-darwin.so\0", 0x5, 0x7FFF5834C3B0) = 0 0 ==440== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote. 9 access("/Users/maloney/dev/lib/valgrind/vgpreload_core-arm-darwin.so\0", 0x5, 0x7FFF5834C3B0) = -1 Err#2 ==440== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info 5 access("/Users/maloney/dev/lib/valgrind/vgpreload_core-ppc32-darwin.so\0", 0x5, 0x7FFF5834C3B0) = -1 Err#2 ==440== Command: ./valgrind-test ==440== 5 access("/Users/maloney/dev/lib/valgrind/vgpreload_core-ppc64be-darwin.so\0", 0x5, 0x7FFF5834C3B0) = -1 Err#2 8 access("/Users/maloney/brew/bin/./valgrind-test\0", 0x5, 0x7FFF5834C3B0) = -1 Err#2 2 access("/Users/maloney/research/bin/./valgrind-test\0", 0x5, 0x7FFF5834C3B0) = -1 Err#2 5 access("/Users/maloney/dev/bin/./valgrind-test\0", 0x5, 0x7FFF5834C3B0) = -1 Err#2 6 access("/usr/local/bin/./valgrind-test\0", 0x5, 0x7FFF5834C3B0) = -1 Err#2 7 access("/usr/bin/./valgrind-test\0", 0x5, 0x7FFF5834C3B0) = -1 Err#2 6 access("/bin/./valgrind-test\0", 0x5, 0x7FFF5834C3B0) = -1 Err#2 6 access("/usr/sbin/./valgrind-test\0", 0x5, 0x7FFF5834C3B0) = -1 Err#2 5 access("/sbin/./valgrind-test\0", 0x5, 0x7FFF5834C3B0) = -1 Err#2 6 open("./valgrind-test\0", 0x0, 0xFFFFFFFFFFFFFFFF) = 3 0 233 read(0x3, "\317\372\355\376\a\0", 0x1000) = 4096 0 7 close(0x3) = 0 0 7 open_nocancel(".\0", 0x0, 0x7F87E6800000) = 3 0 3 fstat64(0x3, 0x7FFF5834D940, 0x7F87E6800000) = 0 0 7 fcntl_nocancel(0x3, 0x32, 0x7F87E6801000) = 0 0 3 close_nocancel(0x3) = 0 0 4 stat64("/Users/maloney/dev/build-valgrind-test-Qt_5_x-Profile\0", 0x7FFF5834D8B0, 0x7F87E6801000) = 0 0 1 getppid(0x7F87E6801000, 0x7FFF5834D8B0, 0x7F87E6801000) = 437 0 279 access("/Users/maloney/dev/lib/valgrind/none-amd64-darwin\0", 0x5, 0x7F87E6801000) = 0 0 I see that it opened and read from the target executable successfully. It's not reporting any errors, just terminating. Nothing else here looks problematic to me - am I missing something? I'd like to understand why that isn't working, but looking through the non-valgrind trace, I see something related to QOS which sounds like our culprit - kevent_qos(). Looking at the valgrind source (priv_syswrap-darwin.h): #if DARWIN_VERS >= DARWIN_10_11 // NYI kevent_qos // 374 #endif /* DARWIN_VERS >= DARWIN_10_11 */ (Thank you for stepping me through this and for your patience...)
> #if DARWIN_VERS >= DARWIN_10_11 > // NYI kevent_qos // 374 > #endif /* DARWIN_VERS >= DARWIN_10_11 */ Yes, that looks like a promising clue. You have now reached the point of knowing as much as I do about what is going on here. I may not be able to lead anymore. I hope your work provides motivation and a head start for someone who knows more about MacOS. Thank you for being a good student, Andy.
*bows to Sensei* My guess would be that given that kevent_qos is related to QOS/priorities it probably doesn't carry any interesting info for valgrind and should just be ignored? I will wait for someone else to weigh in. I'd be happy to continue to learn about this with some guidance and see it through to a patch!
Created attachment 107858 [details] {darwin} Accepts and ignores WQOPS_SET_EVENT_MANAGER_PRIORITY This patch fixes the original issue in this bug report. It recognizes the WQOPS_SET_EVENT_MANAGER_PRIORITY case and ignores it. This does not address the second issue in this report (the kevent_qos issue).
Would it help move this issue along if I create and attach a minimal Qt .app?
There's really two different bug reports here, so I've gone about cleaning this up by addressing the workq_ops warning first. Valgrind master has a fix ed6ad13bc8f2b33c493a72db9915f3681002e8d0 which should mean the warning no longer occurs. Thanks for the initial patch Andy Maloney. I've updated the bug report to reflect that this now just tracks the 'ud2' issue. Given that is likely architecture dependent, not macOS 10.12 dependent, I've unlinked this as a blocker to bz#365327. A minimal test case that reproduces the remaining ud2 opcode issue would be very helpful in getting it resolved.
The easiest way I could think of to create a minimal Qt example was to create a .app for it and include the Qt libs so it's all self-contained. I've included the source and a README with what I did and how I run valgrind on it. Because it is too big to attach (23M), I put it on Google Drive here: https://drive.google.com/open?id=0BxjC6Z37KBFvS0ZlSnJadDZkSWM If you need anything else (or that example isn't helpful and I can do something differently) please let me know.
*** Bug 385604 has been marked as a duplicate of this bug. ***
I ran into this bug today and have a small non-qt program that reproduces the same error as well. It's a simple cli tool that prints out some OpenCL information; basically just wrapping stock OpenCL functions. Tool: https://github.com/rhardih/opencl_util/blob/master/src/oclinf.c Source of interest: https://github.com/rhardih/opencl_util/blob/master/src/opencl_util.c#L554 Output with error: https://gist.github.com/rhardih/939ebfdc6b10acf732b62a805bd7ea93
John Reiser suggested to use this bug as a reference in NEWS for n-i-bz "Fix missing workq_ops operations (macOS)" Rhys, can you tell if it is appropriate to reference this bug and close the bug ?
Phillipe, it is fine to reference this bug in NEWS as being related, but please don't close this bug. The current underlying issue remains unresolved. Per my commit message at the time: > commit ed6ad13bc8f2b33c493a72db9915f3681002e8d0 > Author: Rhys Kidd <rhyskidd@gmail.com> > Date: Sun Oct 1 18:56:05 2017 -0400 > > Fix missing workq_ops operations (macOS) > > Related to discussion in bz#383723. Patch based upon one provided by > Andy Maloney.
(In reply to Rhys Kidd from comment #18) > Phillipe, it is fine to reference this bug in NEWS as being related, but > please don't close this bug. The current underlying issue remains unresolved. Ok. Then I think it is better to keep NEWS as is (i.e. not listing this bug as fixed). Thanks
Created attachment 109078 [details] Minimal example to reproduce issue Attaching a minimal example to reproduce the crash (2 lines of code really).
The source code for the top-most symbol _dispatch_kq_init present in the backtrace of the crash can be found at https://opensource.apple.com/source/libdispatch/libdispatch-703.50.37/src/source.c.auto.html . By correlating the disassembly at https://gist.github.com/Placinta/208f706f6bdefb0e6706a741ceedc271 and the linked source code, the execution of the ud2 instruction is the result of calling DISPATCH_CLIENT_CRASH due to a failed kevent_qos call. The ud2 instruction would cause the macOS crash reporter to launch under normal execution (no valgrind or lldb), and print out the ""Failed to initalize workqueue kevent" message. Thus the u2 instruction is a red herring, and someone needs to figure out why does the kevent_qos call fail.
Ok, so the issue seems to be that the kevent_qos syscall is not implemented in syswrap-darwin.c.
Created attachment 109081 [details] Patch implementing kevent_qos Attaching patch that implements the kevent_qos syscall. I'm not certain that everything is correct (never worked on valgrind), but using existing syscalls as a guidance, the README, and checking the xnu source code, this is what I came up with. Using the minimal test case I attached, this gets past the ud2 crash, and gives another crash which I think is the same as https://bugs.kde.org/show_bug.cgi?id=380269 ==75877== Thread 2: ==75877== Invalid read of size 4 ==75877== at 0x1014B62B1: _pthread_wqthread (in /usr/lib/system/libsystem_pthread.dylib) ==75877== by 0x1014B607C: start_wqthread (in /usr/lib/system/libsystem_pthread.dylib) ==75877== Address 0x18 is not stack'd, malloc'd or (recently) free'd ==75877== ==75877== ==75877== Process terminating with default action of signal 11 (SIGSEGV) ==75877== Access not within mapped region at address 0x18 ==75877== at 0x1014B62B1: _pthread_wqthread (in /usr/lib/system/libsystem_pthread.dylib) ==75877== by 0x1014B607C: start_wqthread (in /usr/lib/system/libsystem_pthread.dylib) On an unrelated note, I think that the code for kevent64 is incorrect, due to it having 7 arguments as per https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man2/kevent.2.html whereas the valgrind code only reads / processes 6 arguments (PRE_REG_READ6).
Thanks Alexandru -- I'll take a look at your proposed patch for kevent_qos
Just wanted to ping to see if there's been any progress on this. Still can't use valgrind on Qt applications on macOS. Thanks.
Hi Andy, Have been slowly making my way through the outstanding valgrind issues on modern macOS, ensuring patches are clean to merge upstream and sufficient regression tests are in place so we don't unwittingly regress in future. I too would like to see it go faster, so I share your frustration. One thing that would be great if you could confirm the proposed patch to valgrind here [0] does actually fix the bug you reported. Note: you may well hit another subsequent bug, perhaps even reported already here on bugs.kde.org, but if you no longer see the existing error message that would be a great help to know. [0] https://bugs.kde.org/attachment.cgi?id=109081
Rhys: Thank you for your work on this. Very much appreciated! I applied Alex's patch and ran (1) nulgrind against my test example and (2) nulgrind against small Qt app. It does indeed get past this issue and on to others. On my test example (1), it can't find the dSYM for some reason (or it isn't generated properly): $ valgrind --tool=none ./valgrind-test2.app/Contents/MacOS/valgrind-test2 ==9410== Nulgrind, the minimal Valgrind tool ==9410== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote. ==9410== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info ==9410== Command: ./valgrind-test2.app/Contents/MacOS/valgrind-test2 ==9410== valgrind: m_debuginfo/debuginfo.c:452 (void discard_or_archive_DebugInfo(DebugInfo *)): Assertion 'is_DebugInfo_active(di)' failed. host stacktrace: ==9410== at 0x258010AFB: ??? ==9410== by 0x258010E9C: ??? ==9410== by 0x258010E73: ??? ... Running nulgrind on a small Qt program (2) gives a new error: $ valgrind --tool=none ./some_app.app/Contents/MacOS/some_app ==9519== Nulgrind, the minimal Valgrind tool ==9519== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote. ==9519== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info ==9519== Command: ./some_app.app/Contents/MacOS/some_app ==9519== QML debugging is enabled. Only use this in a safe environment. --9519-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option --9519-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2 times) --9519-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4 times) ==9519== ==9519== Process terminating with default action of signal 11 (SIGSEGV) ==9519== Access not within mapped region at address 0x18 ==9519== at 0x10677D2B1: ??? (in /usr/lib/system/libsystem_pthread.dylib) ==9519== by 0x10677D07C: ??? (in /usr/lib/system/libsystem_pthread.dylib) ==9519== If you believe this happened as a result of a stack ==9519== overflow in your program's main thread (unlikely but ==9519== possible), you can try to increase the size of the ==9519== main thread stack using the --main-stacksize= flag. ==9519== The main thread stack size used in this run was 8388608. --9519:0:schedule VG_(sema_down): read returned -4 ==9519== Segmentation fault: 11
I just ran into this _dispatch_kq_init bug too (in valgrind 3.13 on macOS 10.12.6). I don't have much to add except: you don't need anything as complex as a Qt app, you can reproduce trying valgrind against TextEdit.app, which is a pretty basic Cocoa app, i.e.: $ valgrind /Applications/TextEdit.app
Rhys: Would it be reasonable to apply that patch and close this issue? Thank you.
This has been fixed in Valgrind git master, as of: 92d6a5388 Fix missing kevent_qos syscall (macOS 10.11). bz#383723 Thanks for the patch Alexandru Croitor!
Confirmed fixed in 3.14rc2. Though it still can't launch TextEdit, for which I've created: https://bugs.kde.org/show_bug.cgi?id=399504