[Valgrind 3.22.0] Internal Error: SIGBUS crash in memcheck when running `minishell` on Ubuntu 24.04 📝 Full Bug Report: ### Description Valgrind crashes with an **internal error** (`SIGBUS`, signal 7) when running my `minishell` program. This appears to be a fatal crash inside Valgrind itself, not a user-code error. It is **100% reproducible**. The crash happens during execution of a redirection function in `minishell`, and Valgrind reports it received a signal it couldn't handle (`si_code=2`, SIGBUS, Address alignment error), and exits. The stacktrace shows it occurs deep in `memcheck`, not in the user program itself. --- ### System Information - **Valgrind Version:** `valgrind-3.22.0` valgrind ./minishell and run > minishell , which is the executable running in valgrind , and the issue happens when i try to close the file after editing . Operating System: Distributor ID: Ubuntu Description: Ubuntu 24.04.2 LTS Release: 24.04 Codename: noble Kernel: Linux sma3ine-EZbook 6.11.0-26-generic x86_64 Reproducible: Yes – crash happens every time. ==13036== Memcheck, a memory error detector ==13036== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al. ==13036== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info ==13036== Command: ./minishell ==13036== --13036-- VALGRIND INTERNAL ERROR: Valgrind received a signal 7 (SIGBUS) - exiting --13036-- si_code=2; Faulting address: 0x10E469; sp: 0x1002dad580 valgrind: the 'impossible' happened: Killed by fatal signal host stacktrace: ==13036== at 0x581C9AC3: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==13036== by 0x581CA8DD: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==13036== by 0x581518CB: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==13036== by 0x58152099: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==13036== by 0x58135614: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==13036== by 0x58135F36: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==13036== by 0x5805C635: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==13036== by 0x5809DB21: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) ==13036== by 0x580EB087: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux) sched status: running_tid=1 Thread 1: status = VgTs_Runnable (lwpid 13036) ==13036== at 0x10E469: out_fd (redirections.c:30) ==13036== by 0x112CE7: redirection (execute_redirections.c:25) ==13036== by 0x112DD0: execute_redirections (execute_redirections.c:52) ==13036== by 0x111762: execute_ast (execute_ast.c:65) ==13036== by 0x10F0C2: execution (execution.c:54) ==13036== by 0x11374A: read_eval_print_loop (main.c:218) ==13036== by 0x113876: main (main.c:256) client stack range: [0x1FFEFE6000 0x1FFF000FFF] client SP: 0x1FFEFFFB40 valgrind stack range: [0x1002CAE000 0x1002DADFFF] top usage: 18232 of 1048576
Does this happen with Valgrind 3.25.1? Can you provide a reproducer?
(In reply to Paul Floyd from comment #1) > Does this happen with Valgrind 3.25.1? > > Can you provide a reproducer? yes , ➔ > minishell --45966-- VALGRIND INTERNAL ERROR: Valgrind received a signal 7 (SIGBUS) - exiting --45966-- si_code=2; Faulting address: 0x4006331; sp: 0x1002eea5a0 valgrind: the 'impossible' happened: Killed by fatal signal host stacktrace: ==45966== at 0x581DF671: disInstr_AMD64_WRK (guest_amd64_toIR.c:32279) ==45966== by 0x581E0469: disInstr_AMD64 (guest_amd64_toIR.c:32687) ==45966== by 0x5816D2F6: disassemble_basic_block_till_stop.constprop.0 (guest_generic_bb_to_IR.c:956) ==45966== by 0x5816DAE4: bb_to_IR (guest_generic_bb_to_IR.c:1365) ==45966== by 0x58151515: LibVEX_FrontEnd (main_main.c:611) ==45966== by 0x58151EF6: LibVEX_Translate (main_main.c:1287) ==45966== by 0x5805BC15: vgPlain_translate (m_translate.c:1835) ==45966== by 0x5809C836: handle_tt_miss (scheduler.c:1144) ==45966== by 0x5809C836: vgPlain_scheduler (scheduler.c:1557) ==45966== by 0x58106A4D: thread_wrapper (syswrap-linux.c:102) ==45966== by 0x58106A4D: run_a_thread_NORETURN (syswrap-linux.c:155) sched status: running_tid=1 Thread 1: status = VgTs_Runnable (lwpid 45966) ==45966== at 0x4006331: out_fd (redirections.c:21) ==45966== by 0x400AC1A: redirection (execute_redirections.c:25) ==45966== by 0x400AD03: execute_redirections (execute_redirections.c:52) ==45966== by 0x4009577: execute_ast (execute_ast.c:65) ==45966== by 0x4007124: execution (execution.c:54) ==45966== by 0x400B880: read_eval_print_loop (main.c:149) ==45966== by 0x400B9AC: main (main.c:185) client stack range: [0x1FFEFE6000 0x1FFF000FFF] client SP: 0x1FFEFFFB80 valgrind stack range: [0x1002DEB000 0x1002EEAFFF] top usage: 18104 of 1048576 Note: see also the FAQ in the source distribution. It contains workarounds to several common problems. In particular, if Valgrind aborted or crashed after identifying problems in your program, there's a good chance that fixing those problems will prevent Valgrind aborting or crashing, especially if it happened in m_mallocfree.c. If that doesn't help, please report this bug to: www.valgrind.org In the bug report, send all the above text, the valgrind version, and what OS and version you are using. Thanks. ➜ MINI-SHELL git:(main) ✗ valgrind --version valgrind-3.25.1
And where can I get minishell? There are many repos when I search the web.
(In reply to Paul Floyd from comment #3) > And where can I get minishell? There are many repos when I search the web. minishell is my own project—not from a public repo. I can provide the source code or executable if needed. Let me know how you'd like to receive it (email or upload link).
Please try the e-mail that I use with this Bugzilla (clicking on my name will give the address).
(In reply to Paul Floyd from comment #5) > Please try the e-mail that I use with this Bugzilla (clicking on my name > will give the address). I've sent you a Personal Access Token (PAT) to access to the repository. Let me know if there's something else i can do
On FreeeBSD amd64, other than loads of readline still-reachables I don't see any problem. I'll try on Fedora 42 amd64 and Ubunta arm64 over the next xouple of days.
I don't have any problems with Fedora 42 amd64. What exactly is the redirection command that you used?
(In reply to Paul Floyd from comment #8) > I don't have any problems with Fedora 42 amd64. > > What exactly is the redirection command that you used? it's happens when i try to redirect to the executable that i'm running on valgrind
Can you provide (only) the exact commands that you type? E.g. valgrind ./minishell ??? some redirection ??? running minishell> some other command ??? some other redirection ???
(In reply to Paul Floyd from comment #10) > Can you provide (only) the exact commands that you type? > > E.g. > valgrind ./minishell ??? some redirection ??? > running minishell> some other command ??? some other redirection ??? -valgrind ./minishell - <anything> > minishell (tries to open the minishell file with the O_TRUNC flag , causes valgrind to crash)
This isn't specific to your minishell. I can reproduce a similar crash with the Korn shell, like this > ./vg-in-place ./ksh ==3492562== Memcheck, a memory error detector ==3492562== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al. ==3492562== Using Valgrind-3.26.0.GIT and LibVEX; rerun with -h for copyright info ==3492562== Command: ./ksh ==3492562== $ echo foo > ./ksh --3492562-- VALGRIND INTERNAL ERROR: Valgrind received a signal 7 (SIGBUS) - exiting --3492562-- si_code=2; Faulting address: 0x4031DE3; sp: 0x1002ece5d0 valgrind: the 'impossible' happened: Killed by fatal signal host stacktrace: ==3492562== at 0x581DD1C5: disInstr_AMD64_WRK (guest_amd64_toIR.c:32279) ==3492562== by 0x581DE2BD: disInstr_AMD64 (guest_amd64_toIR.c:32687) ==3492562== by 0x5816A0C1: disassemble_basic_block_till_stop.constprop.0 (guest_generic_bb_to_IR.c:956) ==3492562== by 0x5816A827: bb_to_IR (guest_generic_bb_to_IR.c:1365) ==3492562== by 0x5814ECE0: LibVEX_FrontEnd (main_main.c:611) ==3492562== by 0x5814F62A: LibVEX_Translate (main_main.c:1287) ==3492562== by 0x5805B2B5: vgPlain_translate (m_translate.c:1835) ==3492562== by 0x58098EBB: handle_chain_me (scheduler.c:1172) ==3492562== by 0x5809B3C3: vgPlain_scheduler (scheduler.c:1568) ==3492562== by 0x58104B3A: thread_wrapper (syswrap-linux.c:102) ==3492562== by 0x58104B3A: run_a_thread_NORETURN (syswrap-linux.c:155) sched status: running_tid=1 Thread 1: status = VgTs_Runnable (lwpid 3492562) ==3492562== at 0x4031DE3: ??? (in /home/paulf/scratch/valgrind/ksh) ==3492562== by 0x4034A2C: ??? (in /home/paulf/scratch/valgrind/ksh) ==3492562== by 0x406D02C: ??? (in /home/paulf/scratch/valgrind/ksh) ==3492562== by 0x40146C6: ??? (in /home/paulf/scratch/valgrind/ksh) ==3492562== by 0x4014FBB: ??? (in /home/paulf/scratch/valgrind/ksh) ==3492562== by 0x55787E4: (below main) (in /usr/lib64/libc-2.28.so) client stack range: [0x1FFEFF7000 0x1FFF000FFF] client SP: 0x1FFEFFE890 valgrind stack range: [0x1002DCF000 0x1002ECEFFF] top usage: 14456 of 1048576 We should probably be rejecting the attempt to open the guest exe. If I run just ksh and do the same thing echo foo > ./ksh ./ksh: ./ksh: cannot create [Text file busy] In strace the rejected syscall is openat(AT_FDCWD, "./ksh", O_WRONLY|O_CREAT|O_TRUNC, 0666) = -1 ETXTBSY (Text file busy)
I've shortened the bug title so that it will fit better in NEWS when it gets fixed.
(In reply to Paul Floyd from comment #13) > I've shortened the bug title so that it will fit better in NEWS when it gets > fixed. Thanks, Paul, for your time. I appreciate the detailed investigation and the follow-up
Created attachment 182663 [details] Initial patch Patch for FreeBSD openat, Generic open and a FreeBSD testcase Need to check Linux and illumos openat, check that VG_(realpath) works and add Linux and illumos testcases
The patch doesn't build on Linux because there's no VG_(lstat). As I wrote back in Jan 2023 #if defined(VGO_freebsd) /* extend this to other OSes as and when needed */ SysRes VG_(lstat) ( const HChar* file_name, struct vg_stat* vgbuf ) It looks like that time has come. Need to make VG_(lstat) available on all platforms in pub_tool_libcfile.h. This seems to compile at least: SysRes VG_(lstat) ( const HChar* file_name, struct vg_stat* vgbuf ) { SysRes res; VG_(memset)(vgbuf, 0, sizeof(*vgbuf)); #if !defined(VGO_freebsd) || (__FreeBSD_version < 1200031) #if defined(VGO_freebsd) struct vki_freebsd11_stat buf; #else struct vki_stat buf; #endif res = VG_(do_syscall2)(__NR_lstat, (UWord)file_name, (UWord)&buf); #else struct vki_stat buf; res = VG_(do_syscall4)(__NR_fstatat, VKI_AT_FDCWD, (UWord)file_name, (UWord)&buf, VKI_AT_SYMLINK_NOFOLLOW); #endif if (!sr_isError(res)) { TRANSLATE_TO_vg_stat(vgbuf, &buf); } return res; } (in m_libcfile.c) There is still a lot to do - Linux openat - Linux openat2 - Darwin openat - Darwin openat_cocancel - Solaris openat - handle /proc/self/exe and /proc/[pid]/exe - adapt the testcase to other platforms
Also need to setup VG_(resolved_exename) on platforms other than FreeBSD.
A first stab for Linux openat: /* And for /proc/self/exe or /proc/<pid>/exe case. */ VG_(sprintf)(name, "/proc/%d/exe", VG_(getpid)()); vg_assert(VG_(resolved_exename) && VG_(resolved_exename)[0] == '/'); const HChar* path = (const HChar*)ARG2; if (ML_(safe_to_deref)( path, 1 )) { HChar tmp[VKI_PATH_MAX]; VG_(realpath)(path, tmp); if (VG_(strcmp)((HChar *)(Addr)ARG2, name) == 0 || VG_(strcmp)((HChar *)(Addr)ARG2, "/proc/self/exe") == 0 || !VG_(strcmp)(tmp, VG_(resolved_exename))) { if ((ARG3 & VKI_O_WRONLY) || (ARG3 & VKI_O_RDWR)) { SET_STATUS_Failure( VKI_ETXTBSY ); return; } sres = VG_(dup)( VG_(cl_exec_fd) ); SET_STATUS_from_SysRes( sres ); if (!sr_isError(sres)) { OffT off = VG_(lseek)( sr_Res(sres), 0, VKI_SEEK_SET ); if (off < 0) SET_STATUS_Failure( VKI_EMFILE ); } return; } } I don't like this much. We are just doing a dup of VG_(cl_exec_fd). As far as I can see cl_exec_fd is already a dup (or 2) ultimately originating from VG_(pre_exec_check) which does res = VG_(open)(exe_name, VKI_O_RDONLY, 0); Probably mostly harmless but it does mean that we will lose any O_ flags like O_NOATIME or O_CLOEXEC. I think that I should make a common function for openat and openat2 for all the flag handling.
(In reply to Paul Floyd from comment #18) > A first stab for Linux openat: I also need to handle dirfs other than AT_FDCWD which means looking up the name of the directory. Then for openat2 there are RESOLVE_* options. I'll start by just trying to get all of the name resolution to work.
commit 7fb17b67f40eb8197c45b5f575daf4fa77d16faa (HEAD -> master, origin/master, origin/HEAD) Author: Paul Floyd <pjfloyd@wanadoo.fr> Date: Sat Jul 19 15:10:31 2025 +0200 Bug 505673 - Valgrind crashes with an internal error and SIGBUS when the guest tries to open its own file with O_WRONLY|O_CREAT|O_TRUNC This is all quite messy. It affects open() openat() and openat2() (the last of which is Linux only). On Linux we also need to check for /proc/self/exe and /proc/PID/exe. On Linux there are also a couple of RESOLVE flags for openat2() that mean _don't_ check /proc magic links. In the general case we need to have some reference to check whether the filename matches the guest filename. So I've added that as VG_(resolved_exename) (which I was already using on FreeBSD). The pathname also needs to be canonicalised. It may be a relative path, symlink or use RESOLVE_IN_ROOT. That uses VG_(realpath) (again which was already present for FreBSD). On illumos the man page says that opening running binaries for writing failes with errno set to ETXTBSY but that's not what the open functions do - they just open the file. So I've done nothing for illumos or Solaris. Maybe I'll open an illumos ticket. I haven't tried on Darwin. The Linux open functions with /proc/self/exe and /proc/PID/exe were just calling dup on the fd that we hold for the client exe. That means that we were ignoring any other flags. That has now changed. If the open doesn't fail because the WRONLY/RDWR flags are set then the syscall gets called from the PRE wrapper using VG_(resolved_exename) instewad of the /proc pathname. I haven't tried to handle all of the Linux openat2 RESOLVE* flags. RESOLVE_NO_MAGICLINKS is handled and I see the LTS test openat202 now passing, so this should also fix Bug 506910. I'm not sure that VG_(realpath) handles all forms of weird path resolution on Linux (on FreeBSD it uses a syscall so that should work OK).