Bug 365327

Summary: Support macOS Sierra (10.12)
Product: [Developer tools] valgrind Reporter: Rhys Kidd <rhyskidd>
Component: generalAssignee: Rhys Kidd <rhyskidd>
Status: CONFIRMED ---    
Severity: normal CC: anton, austinenglish, bartel.eerdekens, bjoern.d.rasmussen, chrischavez, hmijail, ibrahim, jdlferreira90, jseward, louis.brunner.fr, mark.j.abraham, pjfloyd, plaborie, sanssecours, vlasov, zac.oler, zalbiraw, zerderr
Priority: NOR    
Version: 3.12 SVN   
Target Milestone: ---   
Platform: macOS (DMG)   
OS: macOS   
Latest Commit: Version Fixed In:
Bug Depends on: 379893, 380269, 366138, 379371, 379372, 379373, 379390, 379748, 379754, 385279    
Bug Blocks:    
Attachments: Initial support for macOS 10.12 (incomplete) patch set
macOS Sierra incomplete support
attachment-8447-0.html
macOS Sierra incomplete support v2
0001-is_in_syscall-assert.patch
Checks in new syscalls/traps

Description Rhys Kidd 2016-07-10 17:03:44 UTC
Valgrind should support Apple's macOS Sierra (10.12) once a public, general release is made available in late 2016.

This is a meta bug for tracking any issues raised during the developer beta and public beta review process with Valgrind.

Reproducible: Always
Comment 1 Rhys Kidd 2016-09-11 16:47:15 UTC
Created attachment 101036 [details]
Initial support for macOS 10.12 (incomplete) patch set
Comment 2 Rhys Kidd 2016-09-11 16:55:58 UTC
Initial patch set attached that brings up macOS 10.12 support.

Patches against r15949, that allow valgrind and test suite to build on macOS 10.12.

Valgrind binaries do not yet run on macOS 10.12, due to an error mapping the NULL page during load of /usr/lib/dyld, see output below:

$ uname -v
Darwin Kernel Version 16.0.0: Tue Aug 23 17:02:44 PDT 2016; root:xnu-3789.1.31~2/RELEASE_X86_64
$ ./vg-in-place -d -d ls
...
--42669:1: initimg Loading client
--42669:1:     ume load_thin_file: begin:   /bin/ls
--42669:2:     ume   mmap fixed (file) (0x100000000, 20480)
--42669:2:     ume   mmap fixed (file) (0x100005000, 4096)
--42669:2:     ume   mmap fixed (file) (0x100006000, 16384)
--42669:1:     ume load_thin_file: begin:   /usr/lib/dyld
--42669:2:     ume   mmap fixed (file) (0x0, 253952)
valgrind: mmap-FIXED(0x0, 253952) failed in UME (load_segment1) with error 12 (Cannot allocate memory).

On OS X 10.11, that equivalent last mmap is at a higher base address:

$ uname -v
Darwin Kernel Version 15.6.0: Thu Jun 23 18:25:34 PDT 2016; root:xnu-3248.60.10~1/RELEASE_X86_64
$ ./vg-in-place -d -d ls
...
--64933:1: initimg Loading client
--64933:1:     ume load_thin_file: begin:   /bin/ls
--64933:2:     ume   mmap fixed (file) (0x100000000, 20480)
--64933:2:     ume   mmap fixed (file) (0x100005000, 4096)
--64933:2:     ume   mmap fixed (file) (0x100006000, 16384)
--64933:1:     ume load_thin_file: begin:   /usr/lib/dyld
--64933:2:     ume   mmap fixed (file) (0x7fff5fc00000, 229376)
...
Comment 3 Julian Seward 2016-09-20 11:21:04 UTC
The patches look OK to me.  Only one nit:

--- coregrind/m_syswrap/syswrap-amd64-darwin.c	(revision 15949)
+++ coregrind/m_syswrap/syswrap-amd64-darwin.c	(working copy)
+#      elif DARWIN_VERS == DARWIN_10_9 || DARWIN_VERS == DARWIN_10_10 || DARWIN_VERS == DARWIN_10_11 || DARWIN_VERS == DARWIN_10_12

Please break such long lines so as to remain within 80 cols where possible.
Comment 4 Rhys Kidd 2016-09-22 02:55:56 UTC
Preliminary support added in r15976.
Comment 5 Julian Seward 2016-10-18 11:15:54 UTC
(In reply to Rhys Kidd from comment #4)
> Preliminary support added in r15976.

Merged to 3_12_BRANCH in r16071.
Comment 6 joaodlf 2016-11-01 19:38:41 UTC
I'm running macOS (10.12.1) and valgrind-devel @3.13.0-r16104 (devel) and facing the following error when running valgrind --leak-check=yes on a very simple executable:

valgrind: mmap-FIXED(0x0, 253952) failed in UME (load_segment1) with error 12 (Cannot allocate memory).

I'm going to assume "preliminary support" doesn't quite get us there yet :).
Comment 7 Rhys Kidd 2016-11-02 04:05:25 UTC
Thanks for testing the code and reporting here.

Yes, the preliminary support is just that. There's known issues, including this one that appears to relate to the way NULL pages are mapped.

Unfortunately, Apple is still yet to release their APSL-licensed kernel xnu from Sierra as source code (have written their open source team two emails thus far with no response). It makes solving it harder than it should be.
Comment 8 Rhys Kidd 2016-11-30 05:23:32 UTC
*** Bug 372772 has been marked as a duplicate of this bug. ***
Comment 9 Rhys Kidd 2016-11-30 07:23:43 UTC
Apple have just this week finally released the source code for critical system components that shipped in macOS Sierra (10.12). This includes xnu, the kernel.

https://opensource.apple.com/release/macos-1012.html
Comment 10 Bartel 2017-02-17 11:17:03 UTC
Any idea when Sierra support can be expected?

Thanks!
Comment 11 zwing99 2017-03-03 20:32:41 UTC
Re: Bartel's comment is there any progress on this?
Comment 12 Louis Brunner 2017-04-11 10:40:28 UTC
Created attachment 104964 [details]
macOS Sierra incomplete support

I have been working on a patch to support macOS Sierra.
At the moment, it works for a variety of programs (python, GIMP, most basic commands like ls, mkdir) but it is still incomplete (warnings in dyld, libsystem_kernel... crash for some GUI programs).

The patch adds a few required syscalls with placeholder implementations (faccessat, fstatat64, csrctl, getentropy and ulock_wake) and the new way of loading dylib (placing them at the end of the currently loaded segments). The second change means we need to know where the last segment was loaded, which means carrying around one more pointer on pointer in every function (which already have 6-9 arguments), that's why I created a structure (load_info_t) to store all this information and easily carry it around.

It also adds one assert in is_in_syscall in coregrind/m_syswrap/syswrap-main.c to match the other syscall related functions in the same file. I had a difficult to diagnose crash in this function because it didn't check for the existence of the syscall table.

Tell me if you need any change
Comment 13 julien.aubert.mail 2017-04-24 07:25:59 UTC
Created attachment 105168 [details]
attachment-8447-0.html

Thanks. Works fine for me so far!

2017-04-11 12:40 GMT+02:00 Louis Brunner <bugzilla_noreply@kde.org>:

> https://bugs.kde.org/show_bug.cgi?id=365327
>
> Louis Brunner <louis.brunner.fr@gmail.com> changed:
>
>            What    |Removed                     |Added
> ------------------------------------------------------------
> ----------------
>                  CC|                            |
> louis.brunner.fr@gmail.com
>
> --- Comment #12 from Louis Brunner <louis.brunner.fr@gmail.com> ---
> Created attachment 104964 [details]
>   --> https://bugs.kde.org/attachment.cgi?id=104964&action=edit
> macOS Sierra incomplete support
>
> I have been working on a patch to support macOS Sierra.
> At the moment, it works for a variety of programs (python, GIMP, most basic
> commands like ls, mkdir) but it is still incomplete (warnings in dyld,
> libsystem_kernel... crash for some GUI programs).
>
> The patch adds a few required syscalls with placeholder implementations
> (faccessat, fstatat64, csrctl, getentropy and ulock_wake) and the new way
> of
> loading dylib (placing them at the end of the currently loaded segments).
> The
> second change means we need to know where the last segment was loaded,
> which
> means carrying around one more pointer on pointer in every function (which
> already have 6-9 arguments), that's why I created a structure
> (load_info_t) to
> store all this information and easily carry it around.
>
> It also adds one assert in is_in_syscall in coregrind/m_syswrap/syswrap-
> main.c
> to match the other syscall related functions in the same file. I had a
> difficult to diagnose crash in this function because it didn't check for
> the
> existence of the syscall table.
>
> Tell me if you need any change
>
> --
> You are receiving this mail because:
> You voted for the bug.
>
Comment 14 Julian Seward 2017-04-25 06:03:36 UTC
(In reply to Louis Brunner from comment #12)

Louis, thanks for working on this.  A question: with your patch applied,
does it still work on earlier MacOS versions (10.11, 10.10) ?
Comment 15 Austin English 2017-04-26 00:03:28 UTC
(In reply to Louis Brunner from comment #12)
> Created attachment 104964 [details]
> macOS Sierra incomplete support
> 
> I have been working on a patch to support macOS Sierra.
> At the moment, it works for a variety of programs (python, GIMP, most basic
> commands like ls, mkdir) but it is still incomplete (warnings in dyld,
> libsystem_kernel... crash for some GUI programs).
> 
> The patch adds a few required syscalls with placeholder implementations
> (faccessat, fstatat64, csrctl, getentropy and ulock_wake) and the new way of
> loading dylib (placing them at the end of the currently loaded segments).
> The second change means we need to know where the last segment was loaded,
> which means carrying around one more pointer on pointer in every function
> (which already have 6-9 arguments), that's why I created a structure
> (load_info_t) to store all this information and easily carry it around.
> 
> It also adds one assert in is_in_syscall in
> coregrind/m_syswrap/syswrap-main.c to match the other syscall related
> functions in the same file. I had a difficult to diagnose crash in this
> function because it didn't check for the existence of the syscall table.
> 
> Tell me if you need any change

I can confirm that the situation is improved on 10.12 with this patch. E.g., valgrind will at least attempt things now. I noticed that `./vg-in-place ls -al` crashes valgrind, though `./vg-in-place true` works fine (does show some uninitialized memory in OSX though).

A basic sanity check on 10.11 also worked for me.
Comment 16 Louis Brunner 2017-04-27 09:43:04 UTC
Created attachment 105218 [details]
macOS Sierra incomplete support v2

Thank you for the feedback!

Austin, I just added a new patch that solves this issue (I can't believe that I didn't check that...). Thanks for the check on 10.11!

Julian, I don't have access to any 10.10 or 10.11, but I am trying to install one in a VM right now. Technically, most of the changes are scoped in the proper `#if DARWIN_VERS >= DARWIN_10_XX`.
Comment 17 Rhys Kidd 2017-04-27 12:29:45 UTC
I'll test this patch on my 10.10 and 10.9 VMs.
Comment 18 Rhys Kidd 2017-04-29 19:10:18 UTC
With this patch I saw a regression in massif/tests/mmapunmap on macOS 10.11, due to the vg_assert(syscallInfo) added in VG_(is_in_syscall)().

For now I'm going to hold off committing that specific part of your v2 patch, although you could well have uncovered an existing bug that is only being tickled with this change. 

--- mmapunmap.stderr.exp	2016-04-05 21:07:04.000000000 -0400
+++ mmapunmap.stderr.out	2017-04-29 14:42:38.000000000 -0400
@@ -0,0 +1,19 @@
+
+valgrind: m_syswrap/syswrap-main.c:1691 (Bool vgPlain_is_in_syscall(Int)): Assertion 'syscallInfo' failed.
+
+host stacktrace:
+   at 0x238014AA3: ???
+   by 0x238014E4C: ???
+   by 0x238014E2A: ???
+   by 0x2380B1228: ???
+   by 0x238030C77: ???
+   by 0x238031030: ???
+   by 0x238003A7A: ???
+   by 0x2380038AE: ???
+   by 0x23801DADC: ???
+   by 0x23801C943: ???
+
+sched status:
+  running_tid=1
+
+Thread 1: status = VgTs_Init (lwpid 0)
Comment 19 Rhys Kidd 2017-04-29 19:11:18 UTC
There are some new warnings due to scope issues with the patch as is, i.e. see the below warnings that creep in on pre macOS 10.12 systems [0].

> m_syswrap/syswrap-darwin.c:9781:1: warning: no previous prototype for function 'vgSysWrap_darwin_getentropy_before' [-Wmissing-prototypes]
> PRE(getentropy)
> ^
> m_syswrap/syswrap-darwin.c:9788:1: warning: no previous prototype for function 'vgSysWrap_darwin_ulock_wake_before' [-Wmissing-prototypes]
> PRE(ulock_wake)
> ^

Before I commit this, the patch v2 will be broken up into smaller discrete patches to ease bisecting later on.
Comment 20 Rhys Kidd 2017-04-30 01:36:06 UTC
Louis Brunner,

Thank you for the macOS 10.12 patch to improve Valgrind's support on that platform. You may have seen that I've committed most of the changes to the SVN master version today.

A short summary follows, including the items that haven't been committed and why:

Committed to SVN master
1. Improve consistency of VG_(printf)() usage in coregrind/m_ume/macho.c (r16317)
2. New macOS 10.12 way of loading dylib (r16318 - biggest improvement)
3. Add a no-op wrapper for a new-in-10.12 syscall: getentropy (r16319)
4. Add a no-op wrapper for a new-in-10.12 syscall: ulock_wake (r16320)
5. Add a no-op wrapper for a new-in-10.10 syscall: csrctl (r16321)
6. Add a no-op wrapper for a new-in-10.10 syscall: faccessat (r16322)
7. Add a no-op wrapper for a new-in-10.10 syscall: fstatat64 (r16323)

Patches not committed and to discuss:

1. is_in_syscall-assert as noted above this caused one regression. The rebased, standalone patch attached is to be investigated further.

2. host_create_mach_voucher_trap or VG_DARWIN_SYSCALL_CONSTRUCT_MACH(70). Can you provide details of the reproducible bug this is fixing? I was unable to identify a fix this caused. Note we already have a host_create_mach_voucher() implemented.
Comment 21 Rhys Kidd 2017-04-30 01:37:11 UTC
Created attachment 105282 [details]
0001-is_in_syscall-assert.patch
Comment 22 Rhys Kidd 2017-04-30 20:38:31 UTC
Regarding outstanding patch #2 above in my Comment 20, I've now been able to reproduce the missing mach syscall 70 within Valgrind's regression test suite.

bz#379390 is tracking that issue.
Comment 23 Rhys Kidd 2017-05-01 02:18:34 UTC
So some progress has been made. As of r16328 Valgrind SVN trunk on macOS 10.12 is reporting as follows via its regression test suite:

== 629 tests, 450 stderr failures, 79 stdout failures, 1 stderrB failure, 1 stdoutB failure, 31 post failures ==

Whilst this might seem poor, it's only about 200 stderr failures above a similar OS X 10.11 system. And the regressions are predominantly caused solely by bz#379373.

Getting that fixed should greatly improve the support of Valgrind on this modern macOS. Thanks for the patches!
Comment 24 Louis Brunner 2017-05-01 14:37:01 UTC
Hi Rhys,

Thank you for the merge!

While making the changes for the new dylib loading, I encountered a crash coming from is_in_syscall.
I blamed it on the experimental nature of my patch at the time and I can't remember what triggered the crash.

You mentioned that you had older macOS VMs, do you have any advice/guide of any sort to help me set one up for testing?
I could look into the failing test and the warnings linked to `getentropy` and `ulock_wake` (if you didn't fix them already).

I was also working on a new version that included a pselect syscall wrapper (found it missing while testing python with valgrind), I will add it later.
Comment 25 Rhys Kidd 2017-05-01 17:37:29 UTC
I'd strongly suggest running through the regression test suite on your local setup to provide coverage of a reasonable amount of the Darwin code base. It is run with 'make regtest'.

The regression test is not quite automated on Darwin. You will probably need to kill one of the pthread-related tests that hangs at present.

Suggested areas to focus on:
1. pselect syscall
2. Any other missing syscalls that prevent you using a program of interest to you - more motivated to fix bugs in open source you're personally motivated in!
3. ulock_wait syscall - grep through the regression tests *.stderr.diff for the test that triggers it
4. Find a fix for the pthread-related hang in the regression test suite.

Create a separate bug report for each, then attach a patch and make it a blocker for this bug report.

The compiler warnings for 'getentropy' and 'ulock_wake' were fixed before commiting to SVN.
Comment 26 Louis Brunner 2017-05-12 15:22:58 UTC
Created attachment 105483 [details]
Checks in new syscalls/traps

Rhys,

I created new issue for pselect and ulock_wait (couldn't find the failing test though).
Could we simply disable pth_term_signal (the pthread test that hang) while we search for a fix? It makes testing pretty cumbersome. Do you have any idea what the problem could be? Maybe linked to another pthread problem (like 349128)?

I have also included a patch based on the one for ulock_wait (379754) which includes more checks for the newly added syscalls/traps. Do you want me to create an issue per syscall?