512127 – regtest thread_alloca intermittently fails on OSX 10.13

Bug 512127 - regtest thread_alloca intermittently fails on OSX 10.13

Summary: regtest thread_alloca intermittently fails on OSX 10.13

Status:	RESOLVED UNMAINTAINED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	general (other bugs)
Version First Reported In:	unspecified
Platform:	Compiled Sources macOS

Importance:	NOR major
Target Milestone:	---
Assignee:	Paul Floyd

URL:
Keywords:

Depends on:
Blocks:

Reported:	2025-11-15 13:46 UTC by Paul Floyd
Modified:	2025-12-10 20:28 UTC (History)
CC List:	0 users

See Also:	383811
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Paul Floyd 2025-11-15 13:46:24 UTC

==33814== Memcheck, a memory error detector
==33814== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==33814== Using Valgrind-3.27.0.GIT and LibVEX; rerun with -h for copyright info
==33814== Command: ./thread_alloca 30
==33814== 
--33814-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting
--33814-- si_code=1;  Faulting address: 0x70000304E000;  sp: 0x700000ab1bd0

valgrind: the 'impossible' happened:
   Killed by fatal signal

host stacktrace:
==33814==    at 0x25800828D: ???
==33814==    by 0x258007915: ???
==33814==    by 0x258117E0E: ???
==33814==    by 0x258116DFD: ???
==33814==    by 0x2581166F8: ???
==33814==    by 0x2581147EE: ???
==33814==    by 0x258112190: ???
==33814==    by 0x258126265: ???

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable syscall mach:12 (lwpid 771)
==33814==    at 0x10063312E: _kernelrpc_mach_vm_deallocate_trap (in /usr/lib/system/libsystem_kernel.dylib)
==33814==    by 0x10063B752: mach_vm_deallocate (in /usr/lib/system/libsystem_kernel.dylib)
==33814==    by 0x100675564: _pthread_deallocate (in /usr/lib/system/libsystem_pthread.dylib)
==33814==    by 0x10067551F: _pthread_join_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==33814==    by 0x10067786F: _pthread_join (in /usr/lib/system/libsystem_pthread.dylib)
==33814==    by 0x100000BE5: main (thread_alloca.c:50)
client stack range: [0x1040A1000 0x1048A0FFF] client SP: 0x1048A07B8
valgrind stack range: [0x7000009B2000 0x700000AB1FFF] top usage: 9816 of 1048576

Thread 4: status = VgTs_WaitSys syscall unix:515 (lwpid 4355)
==33814==    at 0x10063D15A: __ulock_wait (in /usr/lib/system/libsystem_kernel.dylib)
==33814==    by 0x1006646B9: _os_unfair_lock_lock_slow (in /usr/lib/system/libsystem_platform.dylib)
==33814==    by 0x1006735DE: _pthread_body (in /usr/lib/system/libsystem_pthread.dylib)
==33814==    by 0x10067350C: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==33814==    by 0x100672BF8: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
client stack range: ??????? client SP: 0x700003259EB8
valgrind stack range: [0x7000030D7000 0x7000031D6FFF] top usage: 3360 of 1048576

Thread 6: status = VgTs_WaitSys syscall unix:515 (lwpid 4099)
==33814==    at 0x10063D15A: __ulock_wait (in /usr/lib/system/libsystem_kernel.dylib)
==33814==    by 0x1006646B9: _os_unfair_lock_lock_slow (in /usr/lib/system/libsystem_platform.dylib)
==33814==    by 0x1006735DE: _pthread_body (in /usr/lib/system/libsystem_pthread.dylib)
==33814==    by 0x10067350C: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==33814==    by 0x100672BF8: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
client stack range: ??????? client SP: 0x70000356FEB8
valgrind stack range: [0x7000033ED000 0x7000034ECFFF] top usage: 3360 of 1048576

[snip many similar secondary threads]

Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.



The first thing that is bad here is that there is no host callstack.

That means that Valgrind is failing to read its own debuginfo. That probably means some issue with matching memory to macho segments or something like that.

It looks like the problem guest code is in _kernelrpc_mach_vm_deallocate_trap

There is a second crash that I sometimes get

==33824== Process terminating with default action of signal 11 (SIGSEGV)
==33824==  Access not within mapped region at address 0x7000031DB81A
==33824==    at 0x1006754A3: _pthread_join_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==33824==    by 0x10067786F: _pthread_join (in /usr/lib/system/libsystem_pthread.dylib)
==33824==    by 0x100000BE5: main (thread_alloca.c:50)
==33824==  If you believe this happened as a result of a stack
==33824==  overflow in your program's main thread (unlikely but
==33824==  possible), you can try to increase the size of the
==33824==  main thread stack using the --main-stacksize= flag.
==33824==  The main thread stack size used in this run was 8388608.
==33824== 
==33824== HEAP SUMMARY:
==33824==     in use at exit: 17,749 bytes in 151 blocks
==33824==   total heap usage: 172 allocs, 21 frees, 26,197 bytes allocated
==33824== 

Memcheck: mc_leakcheck.c:1128 (void lc_scan_memory(Addr, SizeT, Bool, Int, Int, Addr, SizeT)): Assertion 'bad_scanned_addr >= VG_ROUNDUP(start, sizeof(Addr))' failed.

host stacktrace:
==33824==    at 0x258059519: ???
==33824==    by 0x25805988F: ???
==33824==    by 0x258059864: ???
==33824==    by 0x258003445: ???
==33824==    by 0x258002C64: ???
==33824==    by 0x2580014F2: ???
==33824==    by 0x258016EA0: ???
==33824==    by 0x25815AAA3: ???
==33824==    by 0x258126427: ???

sched status:
  running_tid=1

It would be a big help to get the host stacktraces.

Comment 1 Paul Floyd 2025-11-26 05:54:46 UTC

Host stack traces are now working. Will push soonish.

Comment 2 Paul Floyd 2025-11-26 07:59:24 UTC

The main part of the problem is

==77391== Process terminating with default action of signal 11 (SIGSEGV)
==77391==  Access not within mapped region at address 0x70000639A41A
==77391==    at 0x1006754A3: _pthread_join_cleanup (in /usr/lib/system/libsystem_pthread.dylib)
==77391==    by 0x10067786F: _pthread_join (in /usr/lib/system/libsystem_pthread.dylib)
==77391==    by 0x100000BE5: main (thread_alloca.c:50)
==77391==  If you believe this happened as a result of a stack
==77391==  overflow in your program's main thread (unlikely but
==77391==  possible), you can try to increase the size of the
==77391==  main thread stack using the --main-stacksize= flag.
==77391==  The main thread stack size used in this run was 8388608.

The second issue

Memcheck: mc_leakcheck.c:1128 (void lc_scan_memory(Addr, SizeT, Bool, Int, Int, Addr, SizeT)): Assertion 'bad_scanned_addr >= VG_ROUNDUP(start, sizeof(Addr))' failed.

host stacktrace:
==77391==    at 0x258059CD9: ??? (m_libcassert.c:426)
==77391==    by 0x25805A04F: ??? (m_libcassert.c:497)
==77391==    by 0x25805A024: ??? (m_libcassert.c:564)
==77391==    by 0x258003C05: ??? (mc_leakcheck.c:1128)
==77391==    by 0x258003424: ??? (mc_leakcheck.c:2028)
==77391==    by 0x258001CB2: ??? (mc_leakcheck.c:2235)
==77391==    by 0x258017660: ??? (mc_main.c:8493)
==77391==    by 0x25815BE13: ??? (m_main.c:2316)
==77391==    by 0x258127774: ??? (syswrap-darwin.c:246)

My guesss is that fixing the first segfault will make the leak alignment assert go away.

Comment 3 Paul Floyd 2025-12-10 20:28:24 UTC

It passes on 10.14. Since 10.13 is long obsolete I'll close this.