Bug 511461

Summary: Darwin 17 (MacOS X 10.13) memcheck issues
Product: [Developer tools] valgrind Reporter: Paul Floyd <pjfloyd>
Component: memcheckAssignee: Paul Floyd <pjfloyd>
Status: REPORTED ---    
Severity: normal    
Priority: NOR    
Version First Reported In: 3.26 GIT   
Target Milestone: ---   
Platform: Other   
OS: Linux   
See Also: https://bugs.kde.org/show_bug.cgi?id=383811
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:

Description Paul Floyd 2025-11-01 12:26:56 UTC
I think that I'm going to close all issues for Darwin 16 (MacOS X 10.12) and earlier. The macs that I can access are running
10.7 Intel (but I hardly ever boot that machine, I should take it to be recycled)
10.13 Intel in VirtualBox (this is the last version that we support)
13 Intel
15 arm64 (a new mac mini)

Overall I get about 200 fails on 10.13. Since memcheck is probably what most users want I'll concentrate on that (and 'none' after that).

Here is a list of issues that I see

== 262 tests, 36 stderr failures, 4 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures ==

Many leaks in libc
==============
Affects at least 8 tests

For instance, the "accounting" test:
--- accounting.stderr.exp       2022-06-06 23:06:24.000000000 +0200
+++ accounting.stderr.out       2025-11-01 07:53:47.000000000 +0100
@@ -5,8 +5,8 @@

 HEAP SUMMARY:
-    in use at exit: 0 bytes in 0 blocks
-  total heap usage: 1 allocs, 1 frees, 1 bytes allocated
+    in use at exit: 17,750 bytes in 151 blocks
+  total heap usage: 173 allocs, 22 frees, 26,199 bytes allocated

autofreepool* also have extra LEAK SUMMARY blocks

some tests have differences in leak categories

Possible fixes:
If the summary is not part of the test, add -q to vgopts
Otherwise a) change the summary to ignore suppressed allocs or b) add expecteds/filters

Aborts in some overlap tests
=======================
Affects 4 tests

I need to look at this a bit more but I think that debug builds use checked str* and mem* that we aren't redirecting.

bug401284 gives

==13887== Process terminating with default action of signal 6 (SIGABRT)
==13887==    at 0x10063BB96: __pthread_sigmask (in /usr/lib/system/libsystem_kernel.dylib)
==13887==    by 0x10031223A: __abort (in /usr/lib/system/libsystem_c.dylib)
==13887==    by 0x1003121BC: abort (in /usr/lib/system/libsystem_c.dylib)
==13887==    by 0x100312320: abort_report_np (in /usr/lib/system/libsystem_c.dylib)
==13887==    by 0x100336BF4: __chk_fail (in /usr/lib/system/libsystem_c.dylib)
==13887==    by 0x100336C04: __chk_fail_overlap (in /usr/lib/system/libsystem_c.dylib)
==13887==    by 0x100336C35: __chk_overlap (in /usr/lib/system/libsystem_c.dylib)
==13887==    by 0x100336D0A: __strncat_chk (in /usr/lib/system/libsystem_c.dylib)
==13887==    by 0x100000F0C: main (bug401284.c:10)

Clearly we aren't redirecting __strncat_chk

libc functionality
==============
Darwin doesn't have aligned_alloc (or is it memalign?)

duplicate_align_size_errors needs a Darwin expected

clang DWARF differences
====================
Probably the biggest category (10 or more?). No inline info, variable names not detected.

scalar and scalar_nocancel
======================
These are Darwin-specific. Need to look at what has changed.

Other
=====
gone_abort_xml extra addr and missing frame
post-syscall fails to interrupt
thread_alloca segfaults

redir failures
==========
static malloc and wrapmallloc and wrapmallocstatic
wcpncpy

Stack description
==============
descr_belowsp has several diffs as to how Darwin sees the memory below the stack.
Comment 1 Paul Floyd 2025-11-09 20:56:27 UTC
I'm seeing some flaky tests
threadname
leak-cases-possible
leak-delta
leak-tree
nanoleak2

Otherwise, tests consistently failing are now down to 25.

libc leaks
=======
memcheck/tests/accounting                (stderr)
memcheck/tests/big_blocks_freed_list     (stderr)
memcheck/tests/leak-cases-exit-on-definite (stderr)
memcheck/tests/lks                       (stderr)

extra LEAK SUMMARY (all zero???)
============================
memcheck/tests/leak-autofreepool-2       (stderr)
memcheck/tests/leak-autofreepool-6       (stderr)

leak category
===========
memcheck/tests/leak-cases-full           (stderr)
memcheck/tests/leak-cases-summary        (stderr)
memcheck/tests/leak_cpp_interior         (stderr)

several diffs then SIGBUS
=====================
memcheck/tests/descr_belowsp             (stderr)

debuginfo diffs (including inlining)
============================
memcheck/tests/dw4                       (stderr)
memcheck/tests/inlinfo                   (stderr)
memcheck/tests/inlinfosupp               (stderr)
memcheck/tests/inlinfosuppobj            (stderr)
memcheck/tests/origin5-bz2               (stderr)
memcheck/tests/supponlyobj               (stderr)
memcheck/tests/varinfo2                  (stderr)
memcheck/tests/varinfo3                  (stderr)
memcheck/tests/varinfo5                  (stderr)
memcheck/tests/varinfo6                  (stderr)

missing stack
===========
memcheck/tests/gone_abrt_xml             (stderr)

failed to interrupt
===============
memcheck/tests/post-syscall              (stderr)

failed redir in exe
==============
memcheck/tests/static_malloc             (stderr)
memcheck/tests/wrapmalloc                (stdout)
memcheck/tests/wrapmallocstatic          (stdout)

EXEC FAILED
==========
memcheck/tests/execve2                   (stderr)
memcheck/tests/thread_alloca             (stderr)


The ones that I consider important are the exec failures, SIGBUS in descr_belowsp (does that mean that the stack info in Valgrind is wrong?), and the missing interrupt in post-syscall.
Comment 2 Paul Floyd 2025-11-11 15:37:22 UTC
memcheck/tests/leak-autofreepool-2 and 6 don't really have all zero leak summaries. 

==18683==         suppressed: 17,749 bytes in 151 blocks

gets filtered to be zero for other tests, but it is still there.
Comment 3 Paul Floyd 2025-11-11 15:39:42 UTC
And the execve failure in thread_alloca is with errno 2 no such file or directory.
Comment 4 Paul Floyd 2025-11-15 13:39:12 UTC
execve2 fixed, will create another item for thread_alloca
Comment 5 Paul Floyd 2025-11-18 21:13:34 UTC
I'm looking at the cases like static_malloc.

This bit of code is not working right

   1424	   vg_assert(!di->have_dinfo);
-> 1425	   if (di->fsm.have_rx_map &&
   1426	       di->fsm.rw_map_count == expected_rw_load_count) {
   1427	      /* Ok, so, finally, we found what we need, and we haven't
   1428	         already read debuginfo for this object.  So let's do so no

expected_rw_load_count we get from the macho header ant it is 1.

di->fsm.rw_map_count  comes from the segement which hass

(NSegment) $0 = {
  kind = SkFileC
  start = 4294967296
  end = 4294971391
  smode = SmFixed
  dev = 16777220
  ino = 2151778714093
  offset = 0
  mode = 31288
  fnIdx = 4
  hasR = '\x01'
  hasW = '\0'
  hasX = '\x01'
  hasT = '\0'
  isCH = '\0'

-> 1279	   is_rx_map = seg->hasR && seg->hasX && !seg->hasW;
   1280	   is_rw_map = seg->hasR && seg->hasW && !seg->hasX;

So that sets RX but not RW.

The next seg is RO

Next seg is the synamic loader.

So the macho header is seeing an RW seg but the two segments don't contain any. One of them is wrong.
Comment 6 Paul Floyd 2025-11-19 08:17:21 UTC
Found and fixed the problem with static_malloc. expected_rw_load_count now gets the same value of RW segments as load_thin_file mmaps and stores in nsegments.
Comment 7 Paul Floyd 2025-11-21 21:47:40 UTC
I'm now down to

== 266 tests, 19 stderr failures, 2 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures ==
memcheck/tests/accounting                (stderr)
memcheck/tests/big_blocks_freed_list     (stderr)
memcheck/tests/descr_belowsp             (stderr)
memcheck/tests/dw4                       (stderr)
memcheck/tests/gone_abrt_xml             (stderr)
memcheck/tests/leak-autofreepool-2       (stderr)
memcheck/tests/leak-autofreepool-6       (stderr)
memcheck/tests/leak-cases-exit-on-definite (stderr)
memcheck/tests/leak-cases-full           (stderr)
memcheck/tests/leak-cases-summary        (stderr)
memcheck/tests/leak_cpp_interior         (stderr)
memcheck/tests/lks                       (stderr)
memcheck/tests/origin5-bz2               (stderr)
memcheck/tests/thread_alloca             (stderr)
memcheck/tests/threadname                (stderr)
memcheck/tests/varinfo2                  (stderr)
memcheck/tests/varinfo3                  (stderr)
memcheck/tests/varinfo5                  (stderr)
memcheck/tests/varinfo6                  (stderr)
memcheck/tests/wrapmalloc                (stdout)
memcheck/tests/wrapmallocstatic          (stdout)

The last two, I need to debug in the macho loading code.

threadname and thread_alloca crash intermittently.

Memcheck: mc_leakcheck.c:1128 (void lc_scan_memory(Addr, SizeT, Bool, Int, Int, Addr, SizeT)): Assertion 'bad_scanned_addr >= VG_ROUNDUP(start, sizeof(Addr))' failed.

An address not 8 byte aligned?

descr_belowsp gives a SIGBUS
==88201== Process terminating with default action of signal 10 (SIGBUS)
==88201==  Non-existent physical address at address 0x70000448EE9F
==88201==    at 0x100001D08: bad_things_till_guard_page (descr_belowsp.c:74)
==88201==    by 0x10000187C: child_fn_0 (descr_belowsp.c:113)
==88201==    by 0x100674660: _pthread_body (in /usr/lib/system/libsystem_pthread.dylib)
==88201==    by 0x10067450C: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==88201==    by 0x100673BF8: thread_start (in /usr/lib/system/libsystem_pthread.dylib)

Need to check Valgrind's map of the stack and guard page.

Just 3 more kinds of errors that look fixable.
Comment 8 Paul Floyd 2025-11-30 07:17:14 UTC
The x86 test more_x86_fp fails with

vex x86->IR: unhandled instruction bytes: 0x66 0xF 0x3A 0x16

VEX has

   /* 66 0F 3A 0F = PALIGNR -- Packed Align Right (XMM) */
   if (sz == 2
       && insn[0] == 0x0F && insn[1] == 0x3A && insn[2] == 0x0F) {

Either the instruction bytes are wrong in the message or sz != 2