Created attachment 183319 [details] Verbose debug output when reproducing the problem I'm seeing an assertion failure inside Memcheck when running v3.25.1 and master at 4ecf8d2832530de0904803c772126aabcf8fb075 on Debian 12 i686: valgrind: m_execontext.c:471 (record_ExeContext_wrk2): Assertion 'n_ips >= 1 && n_ips <= VG_(clo_backtrace_size)' failed. when running a test program that uses `popen`: int main() { FILE *fp = popen("du -s .\n", "r"); assert(fp); uint64_t result; assert(fscanf(fp, "%" PRIu64, &result) == 1); pclose(fp); } with: valgrind --tool=memcheck --track-fds=yes ./reproduce Tweaking the assert showed that n_ips == 0. After the assertion failure execution continues and the assert in the test program fails too because fscanf returns -1. This doesn't happen when the program is run outside Valgrind so I think that the failing Valgrind assert has lasting effects. The similar https://bugs.kde.org/show_bug.cgi?id=391861 suggests that I should run with lots of verbosity and debugging. The result of that is attached along with the full reproduction case. Debian 12's Valgrind 3.19.0 runs the test case successfully. I can try to bisect if that would be useful.
Created attachment 183320 [details] Test program to reproduce
Could reproduce on Debian i686 (but not on any other arch). This issue seems to have been triggered by this commit: commit 41441379baa63b5471385361d08c8df317705b69 Author: Mark Wielaard <mark@klomp.org> Date: Sun Mar 30 17:38:21 2025 +0200 Handle top __syscall_cancel frames when getting stack traces Since glibc 2.41 there are extra frames inserted before doing a syscall to support proper thread cancellation. This breaks various suppressions and regtests involving checking syscall arguments. Solve this by removing those extra frames from the top of the call stack when we are processing a linux system call. https://bugs.kde.org/show_bug.cgi?id=502126 This also removed the _dl_sysinfo_int80 call. Looks like for some reason there isn't anything left after that, so n_ips == 0, triggering the assert.
Thank you for investigating. I can confirm that reverting 4ecf8d2832530de0904803c772126aabcf8fb075 resolves the problem on Debian 12 (glibc 2.36), and the OpenEmbedded-based system (glibc 2.39) that I'm using.
(In reply to Mike Crowe from comment #3) > I can confirm that reverting 4ecf8d2832530de0904803c772126aabcf8fb075 > resolves the problem on Debian 12 (glibc 2.36), and the OpenEmbedded-based > system (glibc 2.39) that I'm using. I copied and pasted the wrong commit hash. :( I of course I meant 41441379baa63b5471385361d08c8df317705b69.
In the end this turned out to be a very simple fix: - for (i = 0; i < found; i++) { + /* We want to keep at least one frame. */ + for (i = 0; i < found - 1; i++) { Sorry this took so long to resolve. commit a4593438d9fb95bae841531bd70a9217818c482b Author: Mark Wielaard <mark@klomp.org> Date: Fri Oct 17 18:23:58 2025 +0200 Keep at least one frame while peeling syscall frames VG_(get_StackTrace_with_deltas) might peel extra glibc syscall (cancel) frames. But if the backtrace failed, or only contains such syscall frames then we should keep at least one (the initial frame will always be there). Various routines expect n_ips of a Stacktrace to be at least 1. https://bugs.kde.org/show_bug.cgi?id=507188
a4593438d9fb95bae841531bd70a9217818c482b on top of 3.25.1 fixes the problem for me. Thanks!