Bug 512666

Summary:	Darwin 17 (MacOS X 10.13) Helgrind/DRD issues
Product:	[Developer tools] valgrind	Reporter:	Paul Floyd <pjfloyd>
Component:	helgrind	Assignee:	Paul Floyd <pjfloyd>
Status:	REPORTED ---
Severity:	normal
Priority:	NOR
Version First Reported In:	3.27 GIT
Target Milestone:	---
Platform:	Compiled Sources
OS:	macOS
See Also:	https://bugs.kde.org/show_bug.cgi?id=383811
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:

Description Paul Floyd 2025-11-27 09:04:10 UTC

There are lots of issues - 80-90% of the testcases are failing. As always quite a lot of those will just be minor stack trace diffs.

Comment 1 Paul Floyd 2025-11-27 20:59:24 UTC

Helgrind is a bit of a disaster. Quite a few hangs.

== 56 tests, 48 stderr failures, 2 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures ==

DRD is a bit better

== 113 tests, 36 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures ==

A few of those are DWARF errors.

Comment 2 Paul Floyd 2025-11-28 12:50:02 UTC

Helgrind is bad, but not quite as bad as I had initially thought.

Something like 30 of the tests contain diffs like

 Thread #x was created
    ... 
-   by 0x........: pthread_create@* (hg_intercepts.c:...)
    by 0x........: main (annotate_rwlock.c:164)

I.e., missing lines from the stacktrace.

Comment 3 Paul Floyd 2025-11-28 20:30:26 UTC

For the Helgrind stack issue, I had the same problem with FreeBSD so I reused the same filter

== 56 tests, 17 stderr failures, 2 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures ==

Of those failures about half are debuginfo related, like

- Address 0x........ is 4 bytes inside data symbol "s_rwlock"
+ Address 0x........ is in the Data segment of ./annotate_rwlock

About a quarter are hangs. I debugged one and it was crashed in the stack walking code
-> 696           uregs.xip = (((UWord*)uregs.xbp)[1]);

The last quarter are errors because Darwin pthreads are non-conformant, stuff like

+Thread #x: Bug in libpthread: write lock granted on mutex/rwlock which is currently wr-held by a different thread

(fixed in later versions of Darwin I hope).

Comment 4 Paul Floyd 2025-11-28 21:02:59 UTC

* thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x700039c43ec8)
  * frame #0: 0x0000000258047647 helgrind-amd64-darwin`vgPlain_get_StackTrace_wrk(tid_if_known=<unavailable>, ips=0x0000700039bbfcd0, max_n_ips=100, sps=0x0000000000000000, fps=0x0000000000000000, startRegs=<unavailable>, fp_max_orig=0) at m_stacktrace.c:696 [opt]
    frame #1: 0x0000000258047cd0 helgrind-amd64-darwin`vgPlain_get_and_pp_StackTrace [inlined] vgPlain_get_StackTrace_with_deltas(tid=2, ips=0x0000700039bbfcd0, n_ips=100, first_ip_delta=0, first_sp_delta=0) at m_stacktrace.c:1698 [opt]
    frame #2: 0x0000000258047c5d helgrind-amd64-darwin`vgPlain_get_and_pp_StackTrace [inlined] vgPlain_get_StackTrace(tid=2, ips=0x0000700039bbfcd0, max_n_ips=100, sps=<unavailable>, first_ip_delta=0) at m_stacktrace.c:1760 [opt]
    frame #3: 0x0000000258047c5d helgrind-amd64-darwin`vgPlain_get_and_pp_StackTrace(tid=2, max_n_ips=100) at m_stacktrace.c:1806 [opt]
    frame #4: 0x000000025802ff6b helgrind-amd64-darwin`print_thread_state(stack_usage='\x01', prefix="", i=2) at m_libcassert.c:374 [opt]
    frame #5: 0x000000025802fa11 helgrind-amd64-darwin`show_sched_status_wrk(host_stacktrace=<unavailable>, stack_usage=<unavailable>, exited_threads='\0', startRegsIN=<unavailable>) at m_libcassert.c:460 [opt]
    frame #6: 0x000000025802fc70 helgrind-amd64-darwin`report_and_quit(report="www.valgrind.org", startRegsIN=<unavailable>) at m_libcassert.c:497 [opt]
    frame #7: 0x000000025802fd0c helgrind-amd64-darwin`panic(name=<unavailable>, report=<unavailable>, str=<unavailable>, startRegs=<unavailable>) at m_libcassert.c:572 [opt]
    frame #8: 0x000000025802fcd3 helgrind-amd64-darwin`vgPlain_core_panic_at(str=<unavailable>, startRegs=<unavailable>) at m_libcassert.c:577 [opt]
    frame #9: 0x0000000258046f2e helgrind-amd64-darwin`sync_signalhandler at m_signals.c:2987 [opt]
    frame #10: 0x0000000258046e62 helgrind-amd64-darwin`sync_signalhandler(sigNo=11, info=0x0000700039bc0ae8, uc=0x0000700039bc0ae8) at m_signals.c:3047 [opt]
    frame #11: 0x0000000258036407 helgrind-amd64-darwin`darwin_signal_demux(a1=0x0000000258046a20, a2=30, a3=11, a4=0x0000700039bc0a80, a5=0x0000700039bc0ae8) at m_libcsignal.c:250 [opt]

Which seems to come down to this bit of code

darwin_signal_demux

      ((void(*)(int,void*,void*))a1) (a3,a4,a5);

--11130:2:  stacks   segment for SP 0x700039C43E78 changed stack start limit from 0x0 to 0x700039BC4000
--11130:1:syswrap- thread_wrapper(tid=2,lwpid=3075): done
--11130:1:syswrap- run_a_thread_NORETURN(tid=2): post-thread_wrapper
--11130:2:  stacks   no addressable segment for SP 0x700039C43E78
--11130:2:libcsign   PRE  demux sig, a2 = 30, signo = 11
--11130-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting
--11130-- si_code=1;  Faulting address: 0x700039C43EC8;  sp: 0x700039bc0ba0

valgrind: the 'impossible' happened:
   Killed by fatal signal

host stacktrace:
==11130==    at 0x258047647: ??? (m_stacktrace.c:696)
==11130==    by 0x25804794C: ??? (m_stacktrace.c:1698)
==11130==    by 0x25802A15A: ??? (m_execontext.c:415)
==11130==    by 0x2580264F3: ??? (m_errormgr.c:715)
==11130==    by 0x258001B2A: ??? (hg_errors.c:651)
==11130==    by 0x25800A95C: ??? (hg_main.c:1724)
==11130==    by 0x2580FD33C: ??? (syswrap-darwin.c:213)
==11130==    by 0x2580FD4AA: ??? (syswrap-darwin.c:370)

sched status:
  running_tid=2

Thread 1: status = VgTs_WaitSys syscall unix:305 (lwpid 771)
==11130==    at 0x100701A16: __psynch_cvwait (in /usr/lib/system/libsystem_kernel.dylib)
==11130==    by 0x100739588: _pthread_cond_wait (in /usr/lib/system/libsystem_pthread.dylib)
==11130==    by 0x1000BA445: pthread_cond_wait_WRK (hg_intercepts.c:1362)
==11130==    by 0x1000E5CAF: std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) (in /usr/lib/libc++.1.dylib)
==11130==    by 0x100001BFC: void std::__1::condition_variable::wait<main::$_1>(std::__1::unique_lock<std::__1::mutex>&, main::$_1) (__mutex_base:375)
==11130==    by 0x100001A19: main (bug392331.cpp:52)
client stack range: [0x1040A7000 0x1048A6FFF] client SP: 0x1048A6858
valgrind stack range: [0x70000078E000 0x70000088DFFF] top usage: 6080 of 1048576

Thread 2: status = VgTs_Runnable (lwpid 3075)
--11130:2:  stacks   no addressable segment for SP 0x700039C43E78

No hang with Louis Brunner's repo.

Comment 5 Paul Floyd 2025-12-01 20:57:39 UTC

DRD update. Now getting 21 fails.

11 contain

+parse DIE(readdwarf3.c:3026): confused by:
+ <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
+     DW_AT_producer    : (indirect string, offset: 0x........): Apple LLVM version 10.0.0 (clang-1000.11.45.5)

circular_buffer seems to be generating false positives.

There are issues with recursive locks like

+drd: drd_rwlock.c:479 (void vgDrd_rwlock_post_wrlock(const Addr, const RwLockT, const Bool)): Assertion 'q->writer_nesting_count == 0' failed.

(4 tests)

The hold_lock tests are missing errors and pth_once has no summary.

The two semaphore tests have issues.

Comment 6 Paul Floyd 2025-12-11 08:02:22 UTC

The pth_once test is failing with

drd: drd_load_store.c:382 (void instr_trace_mem_store(IRSB *const, IRExpr *const, IRExpr *, IRExpr *, IRExpr *const)): Assertion '!data_expr_hi || typeOfIRExpr(bb->tyenv, data_expr_hi) == Ity_I32' failed.