Bug 383723 - [PATCH] Fix missing kevent_qos syscall (macOS 10.11)
Summary: [PATCH] Fix missing kevent_qos syscall (macOS 10.11)
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.14 SVN
Platform: Compiled Sources macOS
: NOR crash
Target Milestone: ---
Assignee: Rhys Kidd
URL:
Keywords:
: 385604 (view as bug list)
Depends on:
Blocks: 348909
  Show dependency treegraph
 
Reported: 2017-08-20 01:03 UTC by Andy
Modified: 2018-10-08 23:25 UTC (History)
8 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
{darwin} Accepts and ignores WQOPS_SET_EVENT_MANAGER_PRIORITY (2.04 KB, patch)
2017-09-14 17:13 UTC, Andy
Details
Minimal example to reproduce issue (303 bytes, text/plain)
2017-11-27 10:43 UTC, Alexandru Croitor
Details
Patch implementing kevent_qos (4.29 KB, patch)
2017-11-27 15:43 UTC, Alexandru Croitor
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andy 2017-08-20 01:03:02 UTC
I sent this to the mailing list. It was reviewed by John Reiser (thanks John!) and he asked that I file this report.

Here are the details I supplied:

Running callgrind on my Qt-based app on macOS 10.12.6 crashes.

Running callgrind on "/bin/date" works (which is not surprising as I think workq_ops is related to threads?)

- in Qt Creator, add a new project
- select "Qt Console Application"
- edit its qmake file to remove "CONFIG += console" (this shouldn't be added on the Mac)
- build "Profile" version


The .pro looks like this:

QT += core
QT -= gui

CONFIG += c++11

TARGET = valgrind-test2
CONFIG -= app_bundle

TEMPLATE = app

SOURCES += main.cpp


And main.cpp looks like this:

#include <QCoreApplication>

int main(int argc, char *argv[])
{
   QCoreApplication a(argc, argv);

   return a.exec();
}


Run valgrind --tool=callgrind <path-to-command-line-executable>

Results:

==35785== Callgrind, a call-graph generating cache profiler
==35785== Copyright (C) 2002-2017, and GNU GPL'd, by Josef Weidendorfer et al.
==35785== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info
==35785== Command: /Users/maloney/dev/build-valgrind-test2-Qt_5_x-Profile/valgrind-test2
==35785==
==35785== For interactive control, run 'callgrind_control -h'.
--35785-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option
--35785-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2 times)
--35785-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4 times)
UNKNOWN workq_ops option 128
==35785== valgrind: Unrecognised instruction at address 0x103b0fb50.
==35785==    at 0x103B0FB50: _dispatch_kq_init (in /usr/lib/system/libdispatch.dylib)
==35785==    by 0x103B0D8FB: _dispatch_client_callout (in /usr/lib/system/libdispatch.dylib)
==35785==    by 0x103B0D8B8: dispatch_once_f (in /usr/lib/system/libdispatch.dylib)
==35785==    by 0x103B0FA90: _dispatch_kq_update (in /usr/lib/system/libdispatch.dylib)
==35785==    by 0x103B110CD: _dispatch_kevent_resume (in /usr/lib/system/libdispatch.dylib)
==35785==    by 0x103B1103D: _dispatch_source_kevent_resume (in /usr/lib/system/libdispatch.dylib)
==35785==    by 0x103B10E85: _dispatch_source_kevent_register (in /usr/lib/system/libdispatch.dylib)
==35785==    by 0x103B20651: _dispatch_queue_resume_finalize_activation (in /usr/lib/system/libdispatch.dylib)
==35785==    by 0x103E603C0: _notify_lib_init (in /usr/lib/system/libsystem_notify.dylib)
==35785==    by 0x103E609AB: notify_register_dispatch (in /usr/lib/system/libsystem_notify.dylib)
==35785==    by 0x1027E8916: CFUniCharMapTo (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==35785==    by 0x103B0D8FB: _dispatch_client_callout (in /usr/lib/system/libdispatch.dylib)
==35785== Your program just tried to execute an instruction that Valgrind
==35785== did not recognise.  There are two possible reasons for this.
==35785== 1. Your program has a bug and erroneously jumped to a non-code
==35785==    location.  If you are running Memcheck and you just saw a
==35785==    warning about a bad jump, it's probably your program's fault.
==35785== 2. The instruction is legitimate but Valgrind doesn't handle it,
==35785==    i.e. it's Valgrind's fault.  If you think this is the case or
==35785==    you are not sure, please let us know and we'll try to fix it.
==35785== Either way, Valgrind will now raise a SIGILL signal which will
==35785== probably kill your program.
==35785==
==35785== Process terminating with default action of signal 4 (SIGILL)
==35785==  Illegal opcode at address 0x103B0FB50
==35785==    at 0x103B0FB50: _dispatch_kq_init (in /usr/lib/system/libdispatch.dylib)
==35785==    by 0x103B0D8FB: _dispatch_client_callout (in /usr/lib/system/libdispatch.dylib)
==35785==    by 0x103B0D8B8: dispatch_once_f (in /usr/lib/system/libdispatch.dylib)
==35785==    by 0x103B0FA90: _dispatch_kq_update (in /usr/lib/system/libdispatch.dylib)
==35785==    by 0x103B110CD: _dispatch_kevent_resume (in /usr/lib/system/libdispatch.dylib)
==35785==    by 0x103B1103D: _dispatch_source_kevent_resume (in /usr/lib/system/libdispatch.dylib)
==35785==    by 0x103B10E85: _dispatch_source_kevent_register (in /usr/lib/system/libdispatch.dylib)
==35785==    by 0x103B20651: _dispatch_queue_resume_finalize_activation (in /usr/lib/system/libdispatch.dylib)
==35785==    by 0x103E603C0: _notify_lib_init (in /usr/lib/system/libsystem_notify.dylib)
==35785==    by 0x103E609AB: notify_register_dispatch (in /usr/lib/system/libsystem_notify.dylib)
==35785==    by 0x1027E8916: CFUniCharMapTo (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==35785==    by 0x103B0D8FB: _dispatch_client_callout (in /usr/lib/system/libdispatch.dylib)
==35785==
==35785== Events    : Ir
==35785== Collected : 188406082
==35785==
==35785== I   refs:      188,406,082
Illegal instruction: 4


John's response and analysis:


I was able to reproduce the problem using --tool=none, so it is not specific
to memcheck, callgrind, etc.  I am running MacOS Sierra Version 10.12.6.
The code in system library libdispatch.dylib expects there to be a trap
handler for opcode 'ud2' (0f 0b) [generates SIGILL] which the valgrind
emulator has disabled through some means, perhaps unknowing or inadvertent.
[Or, perhaps some even-more-global protocol (that would have avoided the 'ud2')
has been violated.]
=====
$ valgrind --tool=none ~jreiser/build-valgrind_test2-Desktop_Qt_5_9_1_clang_64bit-Profile/valgrind_test2
==43499== Nulgrind, the minimal Valgrind tool
==43499== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote.
==43499== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info
==43499== Command: /Users/jreiser/build-valgrind_test2-Desktop_Qt_5_9_1_clang_64bit-Profile/valgrind_test2
==43499==
--43499-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option
--43499-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2 times)
--43499-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4 times)
UNKNOWN workq_ops option 128
==43499== valgrind: Unrecognised instruction at address 0x103b1fb50.
==43499==    at 0x103B1FB50: _dispatch_kq_init (in /usr/lib/system/libdispatch.dylib)
==43499==    by 0x103B1D8FB: _dispatch_client_callout (in /usr/lib/system/libdispatch.dylib)
   [[snip]]
=====

Running valgrind under lldb, and disassembling after the SIGILL:
=====
(lldb) x/12i 0x103b1fb1f
    0x103b1fb1f: e8 2e 48 02 00        callq  0x103b44352
    0x103b1fb24: 83 f8 ff              cmpl   $-0x1, %eax
    0x103b1fb27: 0f 85 a1 00 00 00     jne    0x103b1fbce
    0x103b1fb2d: e8 e8 46 02 00        callq  0x103b4421a
    0x103b1fb32: 48 63 00              movslq (%rax), %rax
    0x103b1fb35: 48 83 f8 04           cmpq   $0x4, %rax
    0x103b1fb39: 74 bf                 je     0x103b1fafa
    0x103b1fb3b: 48 8d 0d dd 71 02 00  leaq   0x271dd(%rip), %rcx
    0x103b1fb42: 48 89 0d f7 cc 04 00  movq   %rcx, 0x4ccf7(%rip)
    0x103b1fb49: 48 89 05 20 cd 04 00  movq   %rax, 0x4cd20(%rip)
=>  0x103b1fb50: 0f 0b                 ud2
    0x103b1fb52: f6 03 01              testb  $0x1, (%rbx)
=====
Obviously %rax and %rcx (and/or 64-bit memory locations (0x4ccf7+0x103b1fb49)
and (0x4cd20+0x103b1fb50)) contain two parameters to some subroutine
that is invoked by the signal handler for the 'ud2' opcode (which generates
SIGILL or its MacOS equivalent).  So perhaps valgrind should restore
the original signal handler for SIGILL during the single instruction 'ud2';
or, libdispatch.dylib may be assuming some other protocol that valgrind
does not know about, etc.



Details:
I had only XCode already installed.  It took a couple hours to download
and install the free version of QtCreator (default version 5.9.1),
then install MacPorts and homebrew (following
https://paolozaino.wordpress.com/2015/05/05/how-to-install-and-use-autotools-on-mac-os-x/
which aroused suspicion because the most recent update was a couple years old)
so that I could run autogen.sh to build valgrind from current git source.
But I did manage to reproduce the problem, so enough of everything probably worked.
Comment 1 John Reiser 2017-08-21 17:03:25 UTC
/bin/date gets some of the "prelude" notices under --tool=none, so investigating that could be a warm-up for working on the "UNKNOWN workq_ops option 128".

$ valgrind --tool=none /bin/date
==44124== Nulgrind, the minimal Valgrind tool
==44124== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote.
==44124== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info
==44124== Command: /bin/date
==44124== 
--44124-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option
--44124-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2 times)
--44124-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4 times)
Mon Aug 21 09:59:20 PDT 2017
==44124==
Comment 2 Andy 2017-09-01 16:18:14 UTC
Is there anything else I can provide to help with this? I'm afraid actually fixing it is beyond my capabilities.

(It's a blocker for me - and anyone else trying to use valgrind for Qt-based apps on macOS it seems.)

Thanks!
Comment 3 John Reiser 2017-09-01 17:58:12 UTC
On 09/01/2017 09:18 AM, Andy wrote:
> https://bugs.kde.org/show_bug.cgi?id=383723
> 
> --- Comment #2 from Andy <imol00+kde@gmail.com> ---
> Is there anything else I can provide to help with this? I'm afraid actually
> fixing it is beyond my capabilities.
> 
> (It's a blocker for me - and anyone else trying to use valgrind for Qt-based
> apps on macOS it seems.)
> 
> Thanks!
> 

Locate some good documentation on MacOS workq.  Specifically,
find the MacOS source code which handles all workq options, including those
that correspond to the cases in PRE(workq_ops) in coregrind/m_syswrap/syswrap-darwin.c .

The closest I could find after modest searching is
    https://opensource.apple.com/source/xnu/xnu-3789.51.2/bsd/kern/pthread_shims.c.auto.html
Apparently there used to be a file pthread_synch.c but I cannot find it.
I did find
    https://opensource.apple.com/source/libpthread/libpthread-137.1.1/kern/workqueue_internal.h
which does have
    #define WQOPS_SET_EVENT_MANAGER_PRIORITY 0x80	/* max() in the provided priority in the the priority of the event manager */
and looks like a clue.  If so, then option 128 could be a no-op for valgrind.  Try that?
Comment 4 Andy 2017-09-01 19:28:19 UTC
Great - thanks John!

Those MACH_SEND_TRAILER warnings you mentioned earlier were reported a couple of years ago:

  https://bugs.kde.org/show_bug.cgi?id=343306
  
Like you I cannot find any documentation on Darwin's workq except some source code.

I found the most recent (released) version of workqueue_internal.h - for macOS 10.12.4:

  https://opensource.apple.com/source/libpthread/libpthread-218.1.3/kern/workqueue_internal.h.auto.html

and where the WQOPS_SET_EVENT_MANAGER_PRIORITY case is processed (see _workq_kernreturn):

  https://opensource.apple.com/source/libpthread/libpthread-218.1.3/kern/kern_support.c.auto.html

and where it is called with this value (see _pthread_workqueue_set_event_manager_priority):

  https://opensource.apple.com/source/libpthread/libpthread-218.1.3/src/pthread.c.auto.html

Based on my reading I think you are correct and it can be ignored for valgrind's purposes because it's for scheduling priorities.

Assuming I did things correctly to test this (simply adding "case 128: break;" to PRE(workq_ops)), it now crashes with an "Unrecognised instruction" instead:

==57909== Callgrind, a call-graph generating cache profiler
==57909== Copyright (C) 2002-2017, and GNU GPL'd, by Josef Weidendorfer et al.
==57909== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info
==57909== Command: /Users/maloney/dev/build-test-valgrind/test-valgrind.app/Contents/MacOS/test-valgrind
==57909== 
==57909== For interactive control, run 'callgrind_control -h'.
--57909-- run: /usr/bin/dsymutil "/Users/maloney/dev/build-test-valgrind/test-valgrind.app/Contents/MacOS/test-valgrind"
--57909-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option
--57909-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2 times)
--57909-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4 times)
==57909== valgrind: Unrecognised instruction at address 0x104018b50.
==57909==    at 0x104018B50: _dispatch_kq_init (in /usr/lib/system/libdispatch.dylib)
==57909==    by 0x1040168FB: _dispatch_client_callout (in /usr/lib/system/libdispatch.dylib)
==57909==    by 0x1040168B8: dispatch_once_f (in /usr/lib/system/libdispatch.dylib)
==57909==    by 0x104018A90: _dispatch_kq_update (in /usr/lib/system/libdispatch.dylib)
==57909==    by 0x10401A0CD: _dispatch_kevent_resume (in /usr/lib/system/libdispatch.dylib)
==57909==    by 0x10401A03D: _dispatch_source_kevent_resume (in /usr/lib/system/libdispatch.dylib)
==57909==    by 0x104019E85: _dispatch_source_kevent_register (in /usr/lib/system/libdispatch.dylib)
==57909==    by 0x104029651: _dispatch_queue_resume_finalize_activation (in /usr/lib/system/libdispatch.dylib)
==57909==    by 0x105AE43C0: _notify_lib_init (in /usr/lib/system/libsystem_notify.dylib)
==57909==    by 0x105AE49AB: notify_register_dispatch (in /usr/lib/system/libsystem_notify.dylib)
==57909==    by 0x1049FE916: CFUniCharMapTo (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==57909==    by 0x1040168FB: _dispatch_client_callout (in /usr/lib/system/libdispatch.dylib)
==57909== Your program just tried to execute an instruction that Valgrind
==57909== did not recognise.  There are two possible reasons for this.
==57909== 1. Your program has a bug and erroneously jumped to a non-code
==57909==    location.  If you are running Memcheck and you just saw a
==57909==    warning about a bad jump, it's probably your program's fault.
==57909== 2. The instruction is legitimate but Valgrind doesn't handle it,
==57909==    i.e. it's Valgrind's fault.  If you think this is the case or
==57909==    you are not sure, please let us know and we'll try to fix it.
==57909== Either way, Valgrind will now raise a SIGILL signal which will
==57909== probably kill your program.
==57909== 
==57909== Process terminating with default action of signal 4 (SIGILL)
==57909==  Illegal opcode at address 0x104018B50
==57909==    at 0x104018B50: _dispatch_kq_init (in /usr/lib/system/libdispatch.dylib)
==57909==    by 0x1040168FB: _dispatch_client_callout (in /usr/lib/system/libdispatch.dylib)
==57909==    by 0x1040168B8: dispatch_once_f (in /usr/lib/system/libdispatch.dylib)
==57909==    by 0x104018A90: _dispatch_kq_update (in /usr/lib/system/libdispatch.dylib)
==57909==    by 0x10401A0CD: _dispatch_kevent_resume (in /usr/lib/system/libdispatch.dylib)
==57909==    by 0x10401A03D: _dispatch_source_kevent_resume (in /usr/lib/system/libdispatch.dylib)
==57909==    by 0x104019E85: _dispatch_source_kevent_register (in /usr/lib/system/libdispatch.dylib)
==57909==    by 0x104029651: _dispatch_queue_resume_finalize_activation (in /usr/lib/system/libdispatch.dylib)
==57909==    by 0x105AE43C0: _notify_lib_init (in /usr/lib/system/libsystem_notify.dylib)
==57909==    by 0x105AE49AB: notify_register_dispatch (in /usr/lib/system/libsystem_notify.dylib)
==57909==    by 0x1049FE916: CFUniCharMapTo (in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation)
==57909==    by 0x1040168FB: _dispatch_client_callout (in /usr/lib/system/libdispatch.dylib)
==57909== 
==57909== Events    : Ir
==57909== Collected : 203711568
==57909== 
==57909== I   refs:      203,711,568
The program has unexpectedly finished.
** Process crashed **
Comment 5 John Reiser 2017-09-01 20:26:15 UTC
The crash looks very similar to the 'ud2' diagnosed in the original Description.  In particular, the offset 0x....b50 is the same.  This probably indicates that valgrind has more-or-less completely missed some aspect of what MacOS is doing.  We need advice from an expert.

<joke> So, spend two weeks at the bar/pub/tavern/restaurants in Cupertino and Sunnyvale.  Buy a beer for the next 10 people who enter.  Chat them up. </joke>

Apply dtruss and/or dtrace to the original program without valgrind, and to valgrind when running the program.  Correlate the system calls between the two runs; try to understand the difference.  [Also investigate "valgrind --trace-syscalls=yes ..." as an additional or alternate source of information.]  That's quite tedious, but logically should work.  Also, contrast with running on Linux (which uses 'strace').  The Qt implementation might be similar enough to provide a clue.
Comment 6 Andy 2017-09-02 17:21:55 UTC
Thanks John.

I must be doing something wrong.

I can run dtruss on my example ok ("sudo dtruss -e ./valgrind-test").

I can run valgrind on the example ("valgrind --tool=none ./valgrind-test") and it crashes (as expected).

But when I run dtruss on valgrind on the example ("sudo dtruss -e valgrind --tool=none ./valgrind-test") it doesn't run the test executable. Here's the output:

dtrace: system integrity protection is on, some features will not be available

 ELAPSD SYSCALL(args) 		 = return
     35 open("/dev/dtracehelper\0", 0x2, 0x7FFF58BBD930)		 = 3 0
    346 ioctl(0x3, 0x80086804, 0x7FFF58BBD8B8)		 = 0 0
      6 close(0x3)		 = 0 0
      2 thread_selfid(0x3, 0x80086804, 0x7FFF58BBD8B8)		 = 2058768 0
      4 bsdthread_register(0x7FFFE336D080, 0x7FFFE336D070, 0x2000)		 = 1073741919 0
      2 ulock_wake(0x1, 0x7FFF58BBD0EC, 0x0)		 = -1 Err#2
      1 issetugid(0x1, 0x7FFF58BBD0EC, 0x0)		 = 0 0
      5 mprotect(0x107049000, 0x88, 0x1)		 = 0 0
      2 mprotect(0x10704B000, 0x1000, 0x0)		 = 0 0
      1 mprotect(0x107061000, 0x1000, 0x0)		 = 0 0
      1 mprotect(0x107062000, 0x1000, 0x0)		 = 0 0
      2 mprotect(0x107078000, 0x1000, 0x0)		 = 0 0
      2 mprotect(0x107079000, 0x1000, 0x1)		 = 0 0
      3 mprotect(0x107049000, 0x88, 0x3)		 = 0 0
      2 mprotect(0x107049000, 0x88, 0x1)		 = 0 0
      1 getpid(0x107049000, 0x88, 0x1)		 = 58961 0
      4 stat64("/AppleInternal/XBS/.isChrooted\0", 0x7FFF58BBCFA8, 0x1)		 = -1 Err#2
      1 stat64("/AppleInternal\0", 0x7FFF58BBD040, 0x1)		 = -1 Err#2
      3 csops(0xE651, 0x7, 0x7FFF58BBCAD0)		 = -1 Err#22
dtrace: error on enabled probe ID 2158 (ID 552: syscall::sysctl:return): invalid kernel access in action #10 at DIF offset 40
      1 ulock_wake(0x1, 0x7FFF58BBD050, 0x0)		 = -1 Err#2
      2 csops(0xE651, 0x7, 0x7FFF58BBC3B0)		 = -1 Err#22
      6 stat64("./valgrind-test\0", 0x7FFF58BBDB40, 0x7FFF58BBC3B0)		 = 0 0
     24 access("/Users/maloney/dev/lib/valgrind/vgpreload_core-x86-darwin.so\0", 0x5, 0x7FFF58BBC3B0)		 = 0 0
     15 access("/Users/maloney/dev/lib/valgrind/vgpreload_core-amd64-darwin.so\0", 0x5, 0x7FFF58BBC3B0)		 = 0 0
      3 access("/Users/maloney/dev/lib/valgrind/vgpreload_core-arm-darwin.so\0", 0x5, 0x7FFF58BBC3B0)		 = -1 Err#2
      2 access("/Users/maloney/dev/lib/valgrind/vgpreload_core-ppc32-darwin.so\0", 0x5, 0x7FFF58BBC3B0)		 = -1 Err#2
      2 access("/Users/maloney/dev/lib/valgrind/vgpreload_core-ppc64be-darwin.so\0", 0x5, 0x7FFF58BBC3B0)		 = -1 Err#2
==58961== Nulgrind, the minimal Valgrind tool
==58961== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote.
      3 access("/Users/maloney/brew/bin/./valgrind-test\0", 0x5, 0x7FFF58BBC3B0)		 = -1 Err#2
==58961== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info
      2 access("/Users/maloney/research/bin/./valgrind-test\0", 0x5, 0x7FFF58BBC3B0)		 = -1 Err#2
==58961== Command: ./valgrind-test
==58961==
      2 access("/Users/maloney/dev/bin/./valgrind-test\0", 0x5, 0x7FFF58BBC3B0)		 = -1 Err#2
      3 access("/usr/local/bin/./valgrind-test\0", 0x5, 0x7FFF58BBC3B0)		 = -1 Err#2
      2 access("/usr/bin/./valgrind-test\0", 0x5, 0x7FFF58BBC3B0)		 = -1 Err#2
      2 access("/bin/./valgrind-test\0", 0x5, 0x7FFF58BBC3B0)		 = -1 Err#2
      2 access("/usr/sbin/./valgrind-test\0", 0x5, 0x7FFF58BBC3B0)		 = -1 Err#2
      2 access("/sbin/./valgrind-test\0", 0x5, 0x7FFF58BBC3B0)		 = -1 Err#2
      7 open("./valgrind-test\0", 0x0, 0xFFFFFFFFFFFFFFFF)		 = 3 0
dtrace: error on enabled probe ID 2134 (ID 154: syscall::read:return): invalid kernel access in action #12 at DIF offset 92
      5 close(0x3)		 = 0 0
      4 open_nocancel(".\0", 0x0, 0x7FB9AB800000)		 = 3 0
      2 fstat64(0x3, 0x7FFF58BBD940, 0x7FB9AB800000)		 = 0 0
      6 fcntl_nocancel(0x3, 0x32, 0x7FB9AB801000)		 = 0 0
      2 close_nocancel(0x3)		 = 0 0
      3 stat64("/Users/maloney/dev/build-valgrind-test-Qt_5_x-Profile\0", 0x7FFF58BBD8B0, 0x7FB9AB801000)		 = 0 0
      1 getppid(0x7FB9AB801000, 0x7FFF58BBD8B0, 0x7FB9AB801000)		 = 58958 0
     14 access("/Users/maloney/dev/lib/valgrind/none-amd64-darwin\0", 0x5, 0x7FB9AB801000)		 = 0 0


It looks like it's checking each path for the executable ("access"), then tries to open the executable and ... fails? I tried it with a full path and get the same result.

Am I doing that correctly? (Apologies - never used dtruss before, so I'm relying on online info and experimentation.)
Comment 7 John Reiser 2017-09-02 19:59:32 UTC
> dtrace: system integrity protection is on, some features will not be
> available

You must disable that system integrity protection (that's why the spawn/"execve"/whatever failed.)   It's a minor hassle.  Search the net, I don't remember the recipe.
Comment 8 Andy 2017-09-02 21:14:05 UTC
Ok I did that but it's still giving me roughly the same thing:

 ELAPSD SYSCALL(args) 		 = return
     35 open("/dev/dtracehelper\0", 0x2, 0x7FFF5834D930)		 = 3 0
    320 ioctl(0x3, 0x80086804, 0x7FFF5834D8B8)		 = 0 0
      6 close(0x3)		 = 0 0
      2 thread_selfid(0x3, 0x80086804, 0x7FFF5834D8B8)		 = 4487 0
      4 bsdthread_register(0x7FFF9894B080, 0x7FFF9894B070, 0x2000)		 = 1073741919 0
      2 ulock_wake(0x1, 0x7FFF5834D0EC, 0x0)		 = -1 Err#2
      1 issetugid(0x1, 0x7FFF5834D0EC, 0x0)		 = 0 0
      4 mprotect(0x1078B9000, 0x88, 0x1)		 = 0 0
      2 mprotect(0x1078BB000, 0x1000, 0x0)		 = 0 0
      1 mprotect(0x1078D1000, 0x1000, 0x0)		 = 0 0
      1 mprotect(0x1078D2000, 0x1000, 0x0)		 = 0 0
      1 mprotect(0x1078E8000, 0x1000, 0x0)		 = 0 0
      2 mprotect(0x1078E9000, 0x1000, 0x1)		 = 0 0
      3 mprotect(0x1078B9000, 0x88, 0x3)		 = 0 0
      2 mprotect(0x1078B9000, 0x88, 0x1)		 = 0 0
      1 getpid(0x1078B9000, 0x88, 0x1)		 = 440 0
      4 stat64("/AppleInternal/XBS/.isChrooted\0", 0x7FFF5834CFA8, 0x1)		 = -1 Err#2
      1 stat64("/AppleInternal\0", 0x7FFF5834D040, 0x1)		 = -1 Err#2
      3 csops(0x1B8, 0x7, 0x7FFF5834CAD0)		 = -1 Err#22
     33 sysctl([CTL_KERN, 14, 1, 440, 0, 0] (4), 0x7FFF5834CC28, 0x7FFF5834CC20, 0x0, 0x0)		 = 0 0
      1 ulock_wake(0x1, 0x7FFF5834D050, 0x0)		 = -1 Err#2
      2 csops(0x1B8, 0x7, 0x7FFF5834C3B0)		 = -1 Err#22
     33 stat64("./valgrind-test\0", 0x7FFF5834DB40, 0x7FFF5834C3B0)		 = 0 0
     35 access("/Users/maloney/dev/lib/valgrind/vgpreload_core-x86-darwin.so\0", 0x5, 0x7FFF5834C3B0)		 = 0 0
==440== Nulgrind, the minimal Valgrind tool
     25 access("/Users/maloney/dev/lib/valgrind/vgpreload_core-amd64-darwin.so\0", 0x5, 0x7FFF5834C3B0)		 = 0 0
==440== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote.
      9 access("/Users/maloney/dev/lib/valgrind/vgpreload_core-arm-darwin.so\0", 0x5, 0x7FFF5834C3B0)		 = -1 Err#2
==440== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info
      5 access("/Users/maloney/dev/lib/valgrind/vgpreload_core-ppc32-darwin.so\0", 0x5, 0x7FFF5834C3B0)		 = -1 Err#2
==440== Command: ./valgrind-test
==440==
      5 access("/Users/maloney/dev/lib/valgrind/vgpreload_core-ppc64be-darwin.so\0", 0x5, 0x7FFF5834C3B0)		 = -1 Err#2
      8 access("/Users/maloney/brew/bin/./valgrind-test\0", 0x5, 0x7FFF5834C3B0)		 = -1 Err#2
      2 access("/Users/maloney/research/bin/./valgrind-test\0", 0x5, 0x7FFF5834C3B0)		 = -1 Err#2
      5 access("/Users/maloney/dev/bin/./valgrind-test\0", 0x5, 0x7FFF5834C3B0)		 = -1 Err#2
      6 access("/usr/local/bin/./valgrind-test\0", 0x5, 0x7FFF5834C3B0)		 = -1 Err#2
      7 access("/usr/bin/./valgrind-test\0", 0x5, 0x7FFF5834C3B0)		 = -1 Err#2
      6 access("/bin/./valgrind-test\0", 0x5, 0x7FFF5834C3B0)		 = -1 Err#2
      6 access("/usr/sbin/./valgrind-test\0", 0x5, 0x7FFF5834C3B0)		 = -1 Err#2
      5 access("/sbin/./valgrind-test\0", 0x5, 0x7FFF5834C3B0)		 = -1 Err#2
      6 open("./valgrind-test\0", 0x0, 0xFFFFFFFFFFFFFFFF)		 = 3 0
    233 read(0x3, "\317\372\355\376\a\0", 0x1000)		 = 4096 0
      7 close(0x3)		 = 0 0
      7 open_nocancel(".\0", 0x0, 0x7F87E6800000)		 = 3 0
      3 fstat64(0x3, 0x7FFF5834D940, 0x7F87E6800000)		 = 0 0
      7 fcntl_nocancel(0x3, 0x32, 0x7F87E6801000)		 = 0 0
      3 close_nocancel(0x3)		 = 0 0
      4 stat64("/Users/maloney/dev/build-valgrind-test-Qt_5_x-Profile\0", 0x7FFF5834D8B0, 0x7F87E6801000)		 = 0 0
      1 getppid(0x7F87E6801000, 0x7FFF5834D8B0, 0x7F87E6801000)		 = 437 0
    279 access("/Users/maloney/dev/lib/valgrind/none-amd64-darwin\0", 0x5, 0x7F87E6801000)		 = 0 0


I see that it opened and read from the target executable successfully. It's not reporting any errors, just terminating. Nothing else here looks problematic to me - am I missing something?

I'd like to understand why that isn't working, but looking through the non-valgrind trace, I see something related to QOS which sounds like our culprit - kevent_qos(). Looking at the valgrind source (priv_syswrap-darwin.h):

#if DARWIN_VERS >= DARWIN_10_11
// NYI kevent_qos                               // 374
#endif /* DARWIN_VERS >= DARWIN_10_11 */

(Thank you for stepping me through this and for your patience...)
Comment 9 John Reiser 2017-09-02 23:31:09 UTC
> #if DARWIN_VERS >= DARWIN_10_11
> // NYI kevent_qos                               // 374
> #endif /* DARWIN_VERS >= DARWIN_10_11 */

Yes, that looks like a promising clue.

You have now reached the point of knowing as much as I do about what is going on here.  I may not be able to lead anymore.  I hope your work provides motivation and a head start for someone who knows more about MacOS.  Thank you for being a good student, Andy.
Comment 10 Andy 2017-09-02 23:38:25 UTC
*bows to Sensei*

My guess would be that given that kevent_qos is related to QOS/priorities it probably doesn't carry any interesting info for valgrind and should just be ignored?

I will wait for someone else to weigh in. I'd be happy to continue to learn about this with some guidance and see it through to a patch!
Comment 11 Andy 2017-09-14 17:13:07 UTC
Created attachment 107858 [details]
{darwin} Accepts and ignores WQOPS_SET_EVENT_MANAGER_PRIORITY

This patch fixes the original issue in this bug report.

It recognizes the WQOPS_SET_EVENT_MANAGER_PRIORITY case and ignores it.

This does not address the second issue in this report (the kevent_qos issue).
Comment 12 Andy 2017-09-27 16:23:37 UTC
Would it help move this issue along if I create and attach a minimal Qt .app?
Comment 13 Rhys Kidd 2017-10-01 23:20:33 UTC
There's really two different bug reports here, so I've gone about cleaning this up by addressing the workq_ops warning first. Valgrind master has a fix ed6ad13bc8f2b33c493a72db9915f3681002e8d0 which should mean the warning no longer occurs.

Thanks for the initial patch Andy Maloney.

I've updated the bug report to reflect that this now just tracks the 'ud2' issue. Given that is likely architecture dependent, not macOS 10.12 dependent, I've unlinked this as a blocker to bz#365327.

A minimal test case that reproduces the remaining ud2 opcode issue would be very helpful in getting it resolved.
Comment 14 Andy 2017-10-02 00:08:04 UTC
The easiest way I could think of to create a minimal Qt example was to create a .app for it and include the Qt libs so it's all self-contained.

I've included the source and a README with what I did and how I run valgrind on it.

Because it is too big to attach (23M), I put it on Google Drive here:

   https://drive.google.com/open?id=0BxjC6Z37KBFvS0ZlSnJadDZkSWM

If you need anything else (or that example isn't helpful and I can do something differently) please let me know.
Comment 15 Rhys Kidd 2017-10-15 18:12:24 UTC
*** Bug 385604 has been marked as a duplicate of this bug. ***
Comment 16 René Hansen 2017-10-25 11:51:49 UTC
I ran into this bug today and have a small non-qt program that reproduces the same error as well.

It's a simple cli tool that prints out some OpenCL information; basically just wrapping stock OpenCL functions.

Tool:

https://github.com/rhardih/opencl_util/blob/master/src/oclinf.c

Source of interest:

https://github.com/rhardih/opencl_util/blob/master/src/opencl_util.c#L554

Output with error:

https://gist.github.com/rhardih/939ebfdc6b10acf732b62a805bd7ea93
Comment 17 Philippe Waroquiers 2017-10-26 19:56:18 UTC
John Reiser suggested to use this bug as a reference in NEWS for
  n-i-bz "Fix missing workq_ops operations (macOS)"

Rhys, can you tell if it is appropriate to reference this bug
and close the bug ?
Comment 18 Rhys Kidd 2017-10-28 19:22:22 UTC
Phillipe, it is fine to reference this bug in NEWS as being related, but please don't close this bug. The current underlying issue remains unresolved.

Per my commit message at the time:

> commit ed6ad13bc8f2b33c493a72db9915f3681002e8d0
> Author: Rhys Kidd <rhyskidd@gmail.com>
> Date:   Sun Oct 1 18:56:05 2017 -0400
> 
>    Fix missing workq_ops operations (macOS)
> 
>    Related to discussion in bz#383723. Patch based upon one provided by
>    Andy Maloney.
Comment 19 Philippe Waroquiers 2017-10-29 17:04:54 UTC
(In reply to Rhys Kidd from comment #18)
> Phillipe, it is fine to reference this bug in NEWS as being related, but
> please don't close this bug. The current underlying issue remains unresolved.
Ok. Then I think it is better to keep NEWS as is (i.e. not listing this bug
as fixed).

Thanks
Comment 20 Alexandru Croitor 2017-11-27 10:43:00 UTC
Created attachment 109078 [details]
Minimal example to reproduce issue

Attaching a minimal example to reproduce the crash (2 lines of code really).
Comment 21 Alexandru Croitor 2017-11-27 12:37:41 UTC
The source code for the top-most symbol _dispatch_kq_init present in the backtrace of the crash can be found at https://opensource.apple.com/source/libdispatch/libdispatch-703.50.37/src/source.c.auto.html . 

By correlating the disassembly at https://gist.github.com/Placinta/208f706f6bdefb0e6706a741ceedc271 and the linked source code, the execution of the ud2 instruction is the result of calling DISPATCH_CLIENT_CRASH due to a failed kevent_qos call. 

The ud2 instruction would cause the macOS crash reporter to launch under normal execution (no valgrind or lldb), and print out the ""Failed to initalize workqueue kevent" message.

Thus the u2 instruction is a red herring, and someone needs to figure out why does the kevent_qos call fail.
Comment 22 Alexandru Croitor 2017-11-27 12:53:20 UTC
Ok, so the issue seems to be that the kevent_qos syscall is not implemented in syswrap-darwin.c.
Comment 23 Alexandru Croitor 2017-11-27 15:43:12 UTC
Created attachment 109081 [details]
Patch implementing kevent_qos

Attaching patch that implements the kevent_qos syscall.

I'm not certain that everything is correct (never worked on valgrind), but using existing syscalls as a guidance, the README, and checking the xnu source code, this is what I came up with.

Using the minimal test case I attached, this gets past the ud2 crash, and gives another crash which I think is the same as https://bugs.kde.org/show_bug.cgi?id=380269

==75877== Thread 2:
==75877== Invalid read of size 4
==75877==    at 0x1014B62B1: _pthread_wqthread (in /usr/lib/system/libsystem_pthread.dylib)
==75877==    by 0x1014B607C: start_wqthread (in /usr/lib/system/libsystem_pthread.dylib)
==75877==  Address 0x18 is not stack'd, malloc'd or (recently) free'd
==75877==
==75877==
==75877== Process terminating with default action of signal 11 (SIGSEGV)
==75877==  Access not within mapped region at address 0x18
==75877==    at 0x1014B62B1: _pthread_wqthread (in /usr/lib/system/libsystem_pthread.dylib)
==75877==    by 0x1014B607C: start_wqthread (in /usr/lib/system/libsystem_pthread.dylib)

On an unrelated note, I think that the code for kevent64 is incorrect, due to it having 7 arguments as per https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man2/kevent.2.html whereas the valgrind code only reads / processes 6 arguments (PRE_REG_READ6).
Comment 24 Rhys Kidd 2017-11-29 03:20:01 UTC
Thanks Alexandru -- I'll take a look at your proposed patch for kevent_qos
Comment 25 Andy 2018-02-22 22:05:54 UTC
Just wanted to ping to see if there's been any progress on this. Still can't use valgrind on Qt applications on macOS.

Thanks.
Comment 26 Rhys Kidd 2018-02-22 23:32:15 UTC
Hi Andy,
Have been slowly making my way through the outstanding valgrind issues on modern macOS, ensuring patches are clean to merge upstream and sufficient regression tests are in place so we don't unwittingly regress in future. I too would like to see it go faster, so I share your frustration.

One thing that would be great if you could confirm the proposed patch to valgrind here [0] does actually fix the bug you reported.

Note: you may well hit another subsequent bug, perhaps even reported already here on bugs.kde.org, but if you no longer see the existing error message that would be a great help to know.

[0] https://bugs.kde.org/attachment.cgi?id=109081
Comment 27 Andy 2018-02-23 12:21:52 UTC
Rhys:

Thank you for your work on this. Very much appreciated!

I applied Alex's patch and ran (1) nulgrind against my test example and (2) nulgrind against small Qt app. It does indeed get past this issue and on to others.

On my test example (1), it can't find the dSYM for some reason (or it isn't generated properly):

$ valgrind --tool=none ./valgrind-test2.app/Contents/MacOS/valgrind-test2
==9410== Nulgrind, the minimal Valgrind tool
==9410== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote.
==9410== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info
==9410== Command: ./valgrind-test2.app/Contents/MacOS/valgrind-test2
==9410==

valgrind: m_debuginfo/debuginfo.c:452 (void discard_or_archive_DebugInfo(DebugInfo *)): Assertion 'is_DebugInfo_active(di)' failed.

host stacktrace:
==9410==    at 0x258010AFB: ???
==9410==    by 0x258010E9C: ???
==9410==    by 0x258010E73: ???
...

Running nulgrind on a small Qt program (2) gives a new error:

$ valgrind --tool=none ./some_app.app/Contents/MacOS/some_app
==9519== Nulgrind, the minimal Valgrind tool
==9519== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote.
==9519== Using Valgrind-3.14.0.GIT and LibVEX; rerun with -h for copyright info
==9519== Command: ./some_app.app/Contents/MacOS/some_app
==9519==
QML debugging is enabled. Only use this in a safe environment.
--9519-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option
--9519-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2 times)
--9519-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4 times)
==9519==
==9519== Process terminating with default action of signal 11 (SIGSEGV)
==9519==  Access not within mapped region at address 0x18
==9519==    at 0x10677D2B1: ??? (in /usr/lib/system/libsystem_pthread.dylib)
==9519==    by 0x10677D07C: ??? (in /usr/lib/system/libsystem_pthread.dylib)
==9519==  If you believe this happened as a result of a stack
==9519==  overflow in your program's main thread (unlikely but
==9519==  possible), you can try to increase the size of the
==9519==  main thread stack using the --main-stacksize= flag.
==9519==  The main thread stack size used in this run was 8388608.
--9519:0:schedule VG_(sema_down): read returned -4
==9519==
Segmentation fault: 11
Comment 28 Sean 2018-04-11 19:58:30 UTC
I just ran into this _dispatch_kq_init bug too (in valgrind 3.13 on macOS 10.12.6).  I don't have much to add except: you don't need anything as complex as a Qt app, you can reproduce trying valgrind against TextEdit.app, which is a pretty basic Cocoa app, i.e.:

$ valgrind /Applications/TextEdit.app
Comment 29 Andy 2018-04-15 13:54:08 UTC
Rhys:

Would it be reasonable to apply that patch and close this issue?

Thank you.
Comment 30 Rhys Kidd 2018-06-03 16:58:07 UTC
This has been fixed in Valgrind git master, as of:

92d6a5388 Fix missing kevent_qos syscall (macOS 10.11). bz#383723

Thanks for the patch Alexandru Croitor!
Comment 31 Sean 2018-10-08 23:25:27 UTC
Confirmed fixed in 3.14rc2.

Though it still can't launch TextEdit, for which I've created:

https://bugs.kde.org/show_bug.cgi?id=399504