505673 – Valgrind crashes with an internal error and SIGBUS when the guest tries to open its own file with O_WRONLY|O_CREAT|O_TRUNC

Bug 505673 - Valgrind crashes with an internal error and SIGBUS when the guest tries to open its own file with O_WRONLY|O_CREAT|O_TRUNC

Summary: Valgrind crashes with an internal error and SIGBUS when the guest tries to op...

Status:	RESOLVED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	memcheck (other bugs)
Version First Reported In:	3.22.0
Platform:	Ubuntu Linux

Importance:	NOR crash
Target Milestone:	---
Assignee:	Paul Floyd

URL:
Keywords:

Depends on:
Blocks:

Reported:	2025-06-17 00:22 UTC by Ismail
Modified:	2025-07-19 15:12 UTC (History)
CC List:	1 user (show)

See Also:	506910
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
Initial patch (7.88 KB, patch) 2025-06-25 18:49 UTC, Paul Floyd	Details
View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Ismail 2025-06-17 00:22:17 UTC

[Valgrind 3.22.0] Internal Error: SIGBUS crash in memcheck when running `minishell` on Ubuntu 24.04
📝 Full Bug Report:

### Description

Valgrind crashes with an **internal error** (`SIGBUS`, signal 7) when running my `minishell` program. This appears to be a fatal crash inside Valgrind itself, not a user-code error. It is **100% reproducible**.

The crash happens during execution of a redirection function in `minishell`, and Valgrind reports it received a signal it couldn't handle (`si_code=2`, SIGBUS, Address alignment error), and exits.

The stacktrace shows it occurs deep in `memcheck`, not in the user program itself.

---

### System Information

- **Valgrind Version:**  
  `valgrind-3.22.0`

  valgrind ./minishell
and run > minishell , which is the executable running in valgrind , and the issue happens when i try to close the file after editing .
Operating System:

Distributor ID: Ubuntu  
Description:    Ubuntu 24.04.2 LTS  
Release:        24.04  
Codename:       noble
Kernel:         Linux sma3ine-EZbook 6.11.0-26-generic x86_64
Reproducible:
Yes – crash happens every time.

==13036== Memcheck, a memory error detector
==13036== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==13036== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==13036== Command: ./minishell
==13036== 

--13036-- VALGRIND INTERNAL ERROR: Valgrind received a signal 7 (SIGBUS) - exiting
--13036-- si_code=2;  Faulting address: 0x10E469;  sp: 0x1002dad580

valgrind: the 'impossible' happened:
   Killed by fatal signal

host stacktrace:
==13036==    at 0x581C9AC3: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==13036==    by 0x581CA8DD: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==13036==    by 0x581518CB: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==13036==    by 0x58152099: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==13036==    by 0x58135614: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==13036==    by 0x58135F36: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==13036==    by 0x5805C635: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==13036==    by 0x5809DB21: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==13036==    by 0x580EB087: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 13036)
==13036==    at 0x10E469: out_fd (redirections.c:30)
==13036==    by 0x112CE7: redirection (execute_redirections.c:25)
==13036==    by 0x112DD0: execute_redirections (execute_redirections.c:52)
==13036==    by 0x111762: execute_ast (execute_ast.c:65)
==13036==    by 0x10F0C2: execution (execution.c:54)
==13036==    by 0x11374A: read_eval_print_loop (main.c:218)
==13036==    by 0x113876: main (main.c:256)

client stack range: [0x1FFEFE6000 0x1FFF000FFF] client SP: 0x1FFEFFFB40  
valgrind stack range: [0x1002CAE000 0x1002DADFFF] top usage: 18232 of 1048576

Comment 1 Paul Floyd 2025-06-17 08:24:20 UTC

Does this happen with Valgrind 3.25.1?

Can you provide a reproducer?

Comment 2 Ismail 2025-06-17 11:29:39 UTC

(In reply to Paul Floyd from comment #1)
> Does this happen with Valgrind 3.25.1?
> 
> Can you provide a reproducer?

yes , 
➔ > minishell
--45966-- VALGRIND INTERNAL ERROR: Valgrind received a signal 7 (SIGBUS) - exiting
--45966-- si_code=2;  Faulting address: 0x4006331;  sp: 0x1002eea5a0

valgrind: the 'impossible' happened:
   Killed by fatal signal

host stacktrace:
==45966==    at 0x581DF671: disInstr_AMD64_WRK (guest_amd64_toIR.c:32279)
==45966==    by 0x581E0469: disInstr_AMD64 (guest_amd64_toIR.c:32687)
==45966==    by 0x5816D2F6: disassemble_basic_block_till_stop.constprop.0 (guest_generic_bb_to_IR.c:956)
==45966==    by 0x5816DAE4: bb_to_IR (guest_generic_bb_to_IR.c:1365)
==45966==    by 0x58151515: LibVEX_FrontEnd (main_main.c:611)
==45966==    by 0x58151EF6: LibVEX_Translate (main_main.c:1287)
==45966==    by 0x5805BC15: vgPlain_translate (m_translate.c:1835)
==45966==    by 0x5809C836: handle_tt_miss (scheduler.c:1144)
==45966==    by 0x5809C836: vgPlain_scheduler (scheduler.c:1557)
==45966==    by 0x58106A4D: thread_wrapper (syswrap-linux.c:102)
==45966==    by 0x58106A4D: run_a_thread_NORETURN (syswrap-linux.c:155)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 45966)
==45966==    at 0x4006331: out_fd (redirections.c:21)
==45966==    by 0x400AC1A: redirection (execute_redirections.c:25)
==45966==    by 0x400AD03: execute_redirections (execute_redirections.c:52)
==45966==    by 0x4009577: execute_ast (execute_ast.c:65)
==45966==    by 0x4007124: execution (execution.c:54)
==45966==    by 0x400B880: read_eval_print_loop (main.c:149)
==45966==    by 0x400B9AC: main (main.c:185)
client stack range: [0x1FFEFE6000 0x1FFF000FFF] client SP: 0x1FFEFFFB80
valgrind stack range: [0x1002DEB000 0x1002EEAFFF] top usage: 18104 of 1048576


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.

➜  MINI-SHELL git:(main) ✗ valgrind --version
valgrind-3.25.1

Comment 3 Paul Floyd 2025-06-17 13:30:15 UTC

And where can I get minishell? There are many repos when I search the web.

Comment 4 Ismail 2025-06-17 14:00:57 UTC

(In reply to Paul Floyd from comment #3)
> And where can I get minishell? There are many repos when I search the web.

minishell is my own project—not from a public repo. I can provide the source code or executable if needed.

Let me know how you'd like to receive it (email or upload link).

Comment 5 Paul Floyd 2025-06-17 17:59:00 UTC

Please try the e-mail that I use with this Bugzilla (clicking on my name will give the address).

Comment 6 Ismail 2025-06-17 19:31:44 UTC

(In reply to Paul Floyd from comment #5)
> Please try the e-mail that I use with this Bugzilla (clicking on my name
> will give the address).

I've sent you a Personal Access Token (PAT) to access to the repository. Let me know if there's something else i can do

Comment 7 Paul Floyd 2025-06-18 05:20:36 UTC

On FreeeBSD amd64, other than loads of readline still-reachables I don't see any problem.

I'll try on Fedora 42 amd64 and Ubunta arm64 over the next xouple of days.

Comment 8 Paul Floyd 2025-06-21 13:49:22 UTC

I don't have any problems with Fedora 42 amd64.

What exactly is the redirection command that you used?

Comment 9 Ismail 2025-06-22 19:53:14 UTC

(In reply to Paul Floyd from comment #8)
> I don't have any problems with Fedora 42 amd64.
> 
> What exactly is the redirection command that you used?

it's happens when i try to redirect to the executable that i'm running on valgrind

Comment 10 Paul Floyd 2025-06-22 19:59:14 UTC

Can you provide (only) the exact commands that you type?

E.g.
valgrind ./minishell ??? some redirection ???
running minishell> some other command ??? some other redirection ???

Comment 11 Ismail 2025-06-22 20:19:44 UTC

(In reply to Paul Floyd from comment #10)
> Can you provide (only) the exact commands that you type?
> 
> E.g.
> valgrind ./minishell ??? some redirection ???
> running minishell> some other command ??? some other redirection ???

-valgrind ./minishell
- <anything>   > minishell (tries to open the minishell file with the O_TRUNC flag , causes valgrind to crash)

Comment 12 Paul Floyd 2025-06-23 08:26:56 UTC

This isn't specific to your minishell. I can reproduce a similar crash with the Korn shell, like this

> ./vg-in-place ./ksh
==3492562== Memcheck, a memory error detector
==3492562== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==3492562== Using Valgrind-3.26.0.GIT and LibVEX; rerun with -h for copyright info
==3492562== Command: ./ksh
==3492562== 
$ echo foo > ./ksh
--3492562-- VALGRIND INTERNAL ERROR: Valgrind received a signal 7 (SIGBUS) - exiting
--3492562-- si_code=2;  Faulting address: 0x4031DE3;  sp: 0x1002ece5d0

valgrind: the 'impossible' happened:
   Killed by fatal signal

host stacktrace:
==3492562==    at 0x581DD1C5: disInstr_AMD64_WRK (guest_amd64_toIR.c:32279)
==3492562==    by 0x581DE2BD: disInstr_AMD64 (guest_amd64_toIR.c:32687)
==3492562==    by 0x5816A0C1: disassemble_basic_block_till_stop.constprop.0 (guest_generic_bb_to_IR.c:956)
==3492562==    by 0x5816A827: bb_to_IR (guest_generic_bb_to_IR.c:1365)
==3492562==    by 0x5814ECE0: LibVEX_FrontEnd (main_main.c:611)
==3492562==    by 0x5814F62A: LibVEX_Translate (main_main.c:1287)
==3492562==    by 0x5805B2B5: vgPlain_translate (m_translate.c:1835)
==3492562==    by 0x58098EBB: handle_chain_me (scheduler.c:1172)
==3492562==    by 0x5809B3C3: vgPlain_scheduler (scheduler.c:1568)
==3492562==    by 0x58104B3A: thread_wrapper (syswrap-linux.c:102)
==3492562==    by 0x58104B3A: run_a_thread_NORETURN (syswrap-linux.c:155)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 3492562)
==3492562==    at 0x4031DE3: ??? (in /home/paulf/scratch/valgrind/ksh)
==3492562==    by 0x4034A2C: ??? (in /home/paulf/scratch/valgrind/ksh)
==3492562==    by 0x406D02C: ??? (in /home/paulf/scratch/valgrind/ksh)
==3492562==    by 0x40146C6: ??? (in /home/paulf/scratch/valgrind/ksh)
==3492562==    by 0x4014FBB: ??? (in /home/paulf/scratch/valgrind/ksh)
==3492562==    by 0x55787E4: (below main) (in /usr/lib64/libc-2.28.so)
client stack range: [0x1FFEFF7000 0x1FFF000FFF] client SP: 0x1FFEFFE890
valgrind stack range: [0x1002DCF000 0x1002ECEFFF] top usage: 14456 of 1048576

We should probably be rejecting the attempt to open the guest exe.

If I run just ksh and do the same thing

echo foo > ./ksh
./ksh: ./ksh: cannot create [Text file busy]

In strace the rejected syscall is

openat(AT_FDCWD, "./ksh", O_WRONLY|O_CREAT|O_TRUNC, 0666) = -1 ETXTBSY (Text file busy)

Comment 13 Paul Floyd 2025-06-23 08:31:16 UTC

I've shortened the bug title so that it will fit better in NEWS when it gets fixed.

Comment 14 Ismail 2025-06-23 08:55:18 UTC

(In reply to Paul Floyd from comment #13)
> I've shortened the bug title so that it will fit better in NEWS when it gets
> fixed.

Thanks, Paul, for your time. I appreciate the detailed investigation and the follow-up

Comment 15 Paul Floyd 2025-06-25 18:49:18 UTC

Created attachment 182663 [details]
Initial patch

Patch for FreeBSD openat, Generic open and a FreeBSD testcase

Need to check Linux and illumos openat, check that VG_(realpath) works and add Linux and illumos testcases

Comment 16 Paul Floyd 2025-06-26 07:50:09 UTC

The patch doesn't build on Linux because there's no VG_(lstat).

As I wrote back in Jan 2023

#if defined(VGO_freebsd)
/* extend this to other OSes as and when needed */
SysRes VG_(lstat) ( const HChar* file_name, struct vg_stat* vgbuf )

It looks like that time has come.

Need to make VG_(lstat) available on all platforms in pub_tool_libcfile.h.

This seems to compile at least:


SysRes VG_(lstat) ( const HChar* file_name, struct vg_stat* vgbuf )
{
   SysRes res;
   VG_(memset)(vgbuf, 0, sizeof(*vgbuf));

#if !defined(VGO_freebsd) || (__FreeBSD_version < 1200031)
#if defined(VGO_freebsd)
   struct vki_freebsd11_stat buf;
#else
   struct vki_stat buf;
#endif
   res = VG_(do_syscall2)(__NR_lstat, (UWord)file_name, (UWord)&buf);
#else
   struct vki_stat buf;
   res = VG_(do_syscall4)(__NR_fstatat, VKI_AT_FDCWD, (UWord)file_name, (UWord)&buf, VKI_AT_SYMLINK_NOFOLLOW);
#endif
   if (!sr_isError(res)) {
      TRANSLATE_TO_vg_stat(vgbuf, &buf);
   }
   return res;
}

(in m_libcfile.c)

There is still a lot to do
- Linux openat
- Linux openat2
- Darwin openat
- Darwin openat_cocancel
- Solaris openat
- handle /proc/self/exe and /proc/[pid]/exe
- adapt the testcase to other platforms

Comment 17 Paul Floyd 2025-06-26 08:23:08 UTC

Also need to setup VG_(resolved_exename) on platforms other than FreeBSD.

Comment 18 Paul Floyd 2025-06-26 08:45:05 UTC

A first stab for Linux openat:

   /* And for /proc/self/exe or /proc/<pid>/exe case. */

   VG_(sprintf)(name, "/proc/%d/exe", VG_(getpid)());
   vg_assert(VG_(resolved_exename) && VG_(resolved_exename)[0] == '/');
   const HChar* path = (const HChar*)ARG2;
   if (ML_(safe_to_deref)( path, 1 )) {
      HChar tmp[VKI_PATH_MAX];
      VG_(realpath)(path, tmp);
      if (VG_(strcmp)((HChar *)(Addr)ARG2, name) == 0
           || VG_(strcmp)((HChar *)(Addr)ARG2, "/proc/self/exe") == 0
           || !VG_(strcmp)(tmp, VG_(resolved_exename))) {
         if ((ARG3 & VKI_O_WRONLY) ||
             (ARG3 & VKI_O_RDWR)) {
             SET_STATUS_Failure( VKI_ETXTBSY );
             return;
         }

         sres = VG_(dup)( VG_(cl_exec_fd) );
         SET_STATUS_from_SysRes( sres );
         if (!sr_isError(sres)) {
            OffT off = VG_(lseek)( sr_Res(sres), 0, VKI_SEEK_SET );
            if (off < 0)
               SET_STATUS_Failure( VKI_EMFILE );
         }
         return;
      }
   }

I don't like this much. We are just doing a dup of VG_(cl_exec_fd). As far as I can see cl_exec_fd is already a dup (or 2) ultimately originating from VG_(pre_exec_check) which does

   res = VG_(open)(exe_name, VKI_O_RDONLY, 0);

Probably mostly harmless but it does mean that we will lose any O_ flags like O_NOATIME or O_CLOEXEC.


I think that I should make a common function for openat and openat2 for all the flag handling.

Comment 19 Paul Floyd 2025-07-08 20:12:32 UTC

(In reply to Paul Floyd from comment #18)
> A first stab for Linux openat:

I also need to handle dirfs other than AT_FDCWD which means looking up the name of the directory.

Then  for openat2 there are RESOLVE_* options. I'll start by just trying to get all of the name resolution to work.

Comment 20 Paul Floyd 2025-07-19 15:11:45 UTC

commit 7fb17b67f40eb8197c45b5f575daf4fa77d16faa (HEAD -> master, origin/master, origin/HEAD)
Author: Paul Floyd <pjfloyd@wanadoo.fr>
Date:   Sat Jul 19 15:10:31 2025 +0200

    Bug 505673 - Valgrind crashes with an internal error and SIGBUS when the guest tries to open its own file with O_WRONLY|O_CREAT|O_TRUNC
    
    This is all quite messy.
    
    It affects open() openat() and openat2() (the last of which is Linux only).
    On Linux we also need to check for /proc/self/exe and /proc/PID/exe.
    On Linux there are also a couple of RESOLVE flags for openat2() that
    mean _don't_ check /proc magic links.
    In the general case we need to have some reference to check whether
    the filename matches the guest filename. So I've added that as
    VG_(resolved_exename) (which I was already using on FreeBSD).
    The pathname also needs to be canonicalised. It may be a
    relative path, symlink or use RESOLVE_IN_ROOT. That uses
    VG_(realpath) (again which was already present for FreBSD).
    On illumos the man page says that opening running binaries for
    writing failes with errno set to ETXTBSY but that's not what
    the open functions do - they just open the file. So I've done nothing
    for illumos or Solaris. Maybe I'll open an illumos ticket.
    I haven't tried on Darwin.
    
    The Linux open functions with /proc/self/exe and /proc/PID/exe
    were just calling dup on the fd that we hold for the client exe.
    That means that we were ignoring any other flags. That has now changed.
    If the open doesn't fail because the WRONLY/RDWR flags are set then
    the syscall gets called from the PRE wrapper using VG_(resolved_exename)
    instewad of the /proc pathname.
    
    I haven't tried to handle all of the Linux openat2 RESOLVE*
    flags. RESOLVE_NO_MAGICLINKS is handled and I see the LTS test
    openat202 now passing, so this should also fix Bug 506910.
    
    I'm not sure that VG_(realpath) handles all forms of weird path
    resolution on Linux (on FreeBSD it uses a syscall so that should
    work OK).