Bug 506928

Summary: ustat syscall not wrapped
Product: [Developer tools] valgrind Reporter: Mark Wielaard <mark>
Component: generalAssignee: mcermak
Status: RESOLVED FIXED    
Severity: normal CC: mcermak
Priority: NOR    
Version First Reported In: 3.25.0   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:
Bug Depends on:    
Bug Blocks: 506971    
Attachments: proposed patch
proposed patch

Description Mark Wielaard 2025-07-11 22:00:59 UTC
The ustat syscall is deprecated, but might still be used.
LTP contains two tests for it ustat01 and ustat02

https://www.man7.org/linux/man-pages/man2/ustat.2.html

coregrind/m_syswrap/syswrap-mips64-linux.c does contain a PRE(sys_ustat) but doesn't actually check the ustat buffer and doesn't have a POST handler.

Note that the struct ustat has two fields which differ on some architectures.
f_tfree is a long on sparc and mips, but an int on all others.
f_tinode is an unsigned int on alpha and s390, but unsigned long on all others.
char f_fname[6] and char f_fpack[6] are the same everywhere, but unused (filled with zeros).
Comment 1 mcermak 2025-07-16 06:50:56 UTC
Created attachment 183258 [details]
proposed patch

(In reply to Mark Wielaard from comment #0)
> Note that the struct ustat has two fields which differ on some architectures.
> f_tfree is a long on sparc and mips, but an int on all others.
> f_tinode is an unsigned int on alpha and s390, but unsigned long on all
> others.

After cross-compiling vmlinux and querying it with pahole, I can see that too \o/. 
Assuming sparc and alpha aren't actively supported by valgrind today, I've only
handled this for mips and s390x.

The first param of ustat is u32 in kernel.  Which in practice should be equivalent
to unsigned int.  But to guarantee 32bits, I've included stdint.h and used uint32_t.

I'm not sure if *flags |= SfMayBlock; should be used here.  It's a disk operation so
maybe it should (?)

> char f_fname[6] and char f_fpack[6] are the same everywhere, but unused
> (filled with zeros).

How can this note be used in the valgrind source?

---

Please, review the attached patch.
Comment 2 Mark Wielaard 2025-07-16 12:47:40 UTC
Hi Martin,

(In reply to mcermak from comment #1)
> (In reply to Mark Wielaard from comment #0)
> > Note that the struct ustat has two fields which differ on some architectures.
> > f_tfree is a long on sparc and mips, but an int on all others.
> > f_tinode is an unsigned int on alpha and s390, but unsigned long on all
> > others.
> 
> After cross-compiling vmlinux and querying it with pahole, I can see that
> too \o/. 

That is real dedication! Please don't feel like you have to cross-compile the kernel.
In most cases it can be found in the kernel sources. In this case include/linux/types.h has the kernel struct ustat.
A git grep ARCH_32BIT_USTAT_F_TINODE then finds the arches which use unsigned int vs unsigned long.

> Assuming sparc and alpha aren't actively supported by valgrind today, I've
> only
> handled this for mips and s390x.

Yes, thanks. There is an out of tree sparc port, but it is incomplete.
A different way to handle this is to define the struct vki_ustat in each
separate include/vki/vki-<arch>-linux.h file instead of vki-linux.h so
you don't need #ifdefs.
 
> The first param of ustat is u32 in kernel.  Which in practice should be
> equivalent
> to unsigned int.  But to guarantee 32bits, I've included stdint.h and used
> uint32_t.

There is __vki_u32 for that. Which gives you the kernel's type (if defining kernel structs).
For internal VEX/Valgrind types you can use Uint (Always 32 bits) as defined in VEX/pub/libvex_basictypes.h
Better not use stdint.h types which might not precisely match on the Host or kernel types.

> I'm not sure if *flags |= SfMayBlock; should be used here.  It's a disk
> operation so
> maybe it should (?)

It probably doesn't need it. It doesn't look it really blocks (indefinitely).
It might want a FUSE_COMPATIBLE_MAY_BLOCK(); like statfs.
FUSE_COMPATIBLE_MAY_BLOCK(); does a SfMayBlock only when running
with --sim-hints=fuse-compatible which indicates the program might
implement a fuse file system itself, so it might need a thread switch
to handle the kernel calling back into the program itself.

> > char f_fname[6] and char f_fpack[6] are the same everywhere, but unused
> > (filled with zeros).
> 
> How can this note be used in the valgrind source?

You are already doing that already with POST_MEM_WRITE( ARG1, sizeof(struct vki_ustat) ); which includes those two fields.
memcheck is interested in which bits are defined, it doesn't really care about the value.

> Please, review the attached patch.

OK, after lunch :)
Comment 3 Mark Wielaard 2025-07-16 22:16:43 UTC
(In reply to Mark Wielaard from comment #2)
> (In reply to mcermak from comment #1)
> > Please, review the attached patch.
> 
> OK, after lunch :)

OK, after dinner...

> Subject: [PATCH] Wrap linux specific syscall 22 (ustat)

> The ustat syscall comes from pre-git linux history.  It is
> deprecated in favor of statfs.  But in some cases it may
> still be used.

OK

> ustat() returns information about a mounted filesystem.  dev
> is a device number  identifying a device containing a mounted
> filesystem.  ubuf is a pointer to a ustat structure.

Maybe give the whole function signature:
int ustat(dev_t dev, struct ustat *ubuf);

> Declare a sys_ustat wrapper in priv_syswrap-linux.h and hook
> it for {amd64,arm,arm64,mips64,nanomips,ppc32,ppc64,riscv64,\
> s390x,x86}-linux using LINXY with PRE and POST handler in
> syswrap-linux.c

> https://bugs.kde.org/show_bug.cgi?id=506928

OK.

> diff --git a/NEWS b/NEWS
> index 796d9716e..e759daf2f 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -56,6 +56,7 @@ are not entered into bugzilla tend to get forgotten about or ignored.
>  506499  Unhandled syscall 592 (exterrctl - FreeBSD
>  506795  Better report which clone flags are problematic
>  506930  valgrind allows SIGKILL being reset to SIG_DFL
> +506928  Wrap (deprecated) linux specific ustat syscall

Please sort on number. Should be before 506930

> diff --git a/coregrind/m_syswrap/priv_syswrap-linux.h b/coregrind/m_syswrap/priv_syswrap-linux.h
> index 9e6cb8981..ce10a35f6 100644
> --- a/coregrind/m_syswrap/priv_syswrap-linux.h
> +++ b/coregrind/m_syswrap/priv_syswrap-linux.h
> @@ -40,6 +40,7 @@ extern void ML_(call_on_new_stack_0_1) ( Addr stack, Addr retaddr,
 
>  // Linux-specific (but non-arch-specific) syscalls
 
> +DECL_TEMPLATE(linux, sys_ustat);
>  DECL_TEMPLATE(linux, sys_clone)
>  DECL_TEMPLATE(linux, sys_mount);
>  DECL_TEMPLATE(linux, sys_oldumount);

OK.

> diff --git a/coregrind/m_syswrap/syswrap-amd64-linux.c b/coregrind/m_syswrap/syswrap-amd64-linux.c
> index 22989f9fa..c80286f00 100644
> --- a/coregrind/m_syswrap/syswrap-amd64-linux.c
> +++ b/coregrind/m_syswrap/syswrap-amd64-linux.c
> @@ -633,7 +633,7 @@ static SyscallTableEntry syscall_table[] = {
>     //   (__NR_uselib,            sys_uselib),         // 134 
 
>     LINX_(__NR_personality,       sys_personality),    // 135 
> -   //   (__NR_ustat,             sys_ustat),          // 136 
> +   LINXY(__NR_ustat,             sys_ustat),          // 136
>     GENXY(__NR_statfs,            sys_statfs),         // 137 
>     GENXY(__NR_fstatfs,           sys_fstatfs),        // 138 
>     //   (__NR_sysfs,             sys_sysfs),          // 139 

OK, as are the other arches.

> diff --git a/coregrind/m_syswrap/syswrap-linux.c b/coregrind/m_syswrap/syswrap-linux.c
> index 306c3a2f8..c9b2c500a 100644
> --- a/coregrind/m_syswrap/syswrap-linux.c
> +++ b/coregrind/m_syswrap/syswrap-linux.c
> @@ -63,6 +63,7 @@
>  #include "priv_syswrap-linux.h"
>  #include "priv_syswrap-main.h"
>  #include "priv_syswrap-xen.h"
> +#include <stdint.h>

Please don't use stdint.h types. See below.

> +PRE(sys_ustat)
> +{
> +   PRINT("sys_ustat ( %#" FMT_REGWORD "x, %#" FMT_REGWORD "x)", ARG1, ARG2);
> +   // unsigned int should in practice be good equivalent to u32, but...
> +   PRE_REG_READ2(long, "ustat", uint32_t, dev, struct vki_ustat *, ubuf);

You can use vki_uint32_t or __vki_u32

> +   if (ARG2 != 0) {
> +      PRE_MEM_WRITE( "ustat(ubuf)", ARG2, sizeof(struct vki_ustat) );
> +   }
> +}

Don't need the ARG2 != 0.
PRE_MEM_WRITE will warn if ARG2 is NULL, which is what we want.

> +POST(sys_ustat)
> +{
> +   if (ARG2 != 0) {
> +      POST_MEM_WRITE( ARG1, sizeof(struct vki_ustat) );
> +   }
> +}

You don't need the ARG2 != 0. POST_MEM_WRITE should work on ARG2 (not ARG1).

> diff --git a/coregrind/m_syswrap/syswrap-mips64-linux.c b/coregrind/m_syswrap/syswrap-mips64-linux.c
> index 67e5c0b37..8e2dcbe93 100644
> --- a/coregrind/m_syswrap/syswrap-mips64-linux.c
> +++ b/coregrind/m_syswrap/syswrap-mips64-linux.c
> @@ -216,7 +216,6 @@ SysRes sys_set_tls ( ThreadId tid, Addr tlsptr )
 
>  DECL_TEMPLATE (mips_linux, sys_set_thread_area);
>  DECL_TEMPLATE (mips_linux, sys_vmsplice);
> -DECL_TEMPLATE (mips_linux, sys_ustat);
>  DECL_TEMPLATE (mips_linux, sys_sysfs);
>  DECL_TEMPLATE (mips_linux, sys_swapon);
>  DECL_TEMPLATE (mips_linux, sys_swapoff);
> @@ -248,12 +247,6 @@ PRE(sys_sched_rr_get_interval)
>     *flags |= SfMayBlock;
>  }
 
> -PRE(sys_ustat)
> -{
> -   PRINT("sys_ustat ( %#" FMT_REGWORD "x, %#" FMT_REGWORD "x)", ARG1, ARG2);
> -   PRE_REG_READ2(long, "ustat", int, flags, const void *, path);
> -}
> -

OK, since this is now going into linux-generic.

> diff --git a/include/vki/vki-linux.h b/include/vki/vki-linux.h
> index 8c919845b..9072b73e5 100644
> --- a/include/vki/vki-linux.h
> +++ b/include/vki/vki-linux.h
> @@ -171,6 +171,25 @@ typedef struct {
>  typedef int __vki_kernel_key_t;
>  typedef int __vki_kernel_mqd_t;
 
> +//----------------------------------------------------------------------
> +// From pre-git history /include/linux/types.h
> +//----------------------------------------------------------------------
> +
> +struct vki_ustat {
> +#ifdef __mips__
> +	long			f_tfree;
> +#else
> +	int			f_tfree;
> +#endif

Please use #if defined(VGA_mips64), defined(VGA_mips32) and/or #elif defined(VGA_nanomips)
depending on which Valgrind Architecure this holds for.

> +#ifdef __s390x__
> +	unsigned int		f_tinode;
> +#else
> +	unsigned long		f_tinode;
> +#endif

Please use #if defined(VGA_s390x)

> +	char			f_fname[6];
> +	char			f_fpack[6];
> +};
> +
Comment 4 mcermak 2025-07-17 07:33:11 UTC
Created attachment 183295 [details]
proposed patch

Hi Mark,

Thank you for your review.

(In reply to Mark Wielaard from comment #2)
> > +   if (ARG2 != 0) {
> > +      PRE_MEM_WRITE( "ustat(ubuf)", ARG2, sizeof(struct vki_ustat) );
> > +   }
> > +}
> 
> Don't need the ARG2 != 0.
> PRE_MEM_WRITE will warn if ARG2 is NULL, which is what we want.
> 
> > +POST(sys_ustat)
> > +{
> > +   if (ARG2 != 0) {
> > +      POST_MEM_WRITE( ARG1, sizeof(struct vki_ustat) );
> > +   }
> > +}
> 
> You don't need the ARG2 != 0. POST_MEM_WRITE should work on ARG2 (not ARG1).

In that case, would it make sense to do something like the following?

$ git diff
diff --git a/README_MISSING_SYSCALL_OR_IOCTL b/README_MISSING_SYSCALL_OR_IOCTL
index 8ddced5c9..d3aee517e 100644
--- a/README_MISSING_SYSCALL_OR_IOCTL
+++ b/README_MISSING_SYSCALL_OR_IOCTL
@@ -51,16 +51,12 @@ The wrapper for the time system call looks like this:
      /* time_t time(time_t *t); */
      PRINT("sys_time ( %p )",ARG1);
      PRE_REG_READ1(long, "time", int *, t);
-     if (ARG1 != 0) {
-        PRE_MEM_WRITE( "time(t)", ARG1, sizeof(vki_time_t) );
-     }
+     PRE_MEM_WRITE( "time(t)", ARG1, sizeof(vki_time_t) );
   }
 
   POST(sys_time)
   {  
-     if (ARG1 != 0) {
-        POST_MEM_WRITE( ARG1, sizeof(vki_time_t) );
-     }
+     POST_MEM_WRITE( ARG1, sizeof(vki_time_t) );
   }
 
 The first thing we do happens before the syscall occurs, in the PRE() function.
$ 

> Please use #if defined(VGA_mips64), defined(VGA_mips32) and/or #elif
> defined(VGA_nanomips)
> depending on which Valgrind Architecure this holds for.
> 
> > +#ifdef __s390x__
> > +	unsigned int		f_tinode;
> > +#else
> > +	unsigned long		f_tinode;
> > +#endif
> 
> Please use #if defined(VGA_s390x)
> 
> > +	char			f_fname[6];
> > +	char			f_fpack[6];
> > +};
> > +

I did that.  For f_tfree, I know for sure it's correct for VGA_mips32 and VGA_mips64.  I will  *assume* that the same applies to VGA_nanomips too.

I'm attaching updated patch.  I believe it does address all the points.  Please, check.
Comment 5 Mark Wielaard 2025-07-17 11:18:18 UTC
Hi Martin,

(In reply to mcermak from comment #4)
> (In reply to Mark Wielaard from comment #2)
> > > +   if (ARG2 != 0) {
> > > +      PRE_MEM_WRITE( "ustat(ubuf)", ARG2, sizeof(struct vki_ustat) );
> > > +   }
> > > +}
> > 
> > Don't need the ARG2 != 0.
> > PRE_MEM_WRITE will warn if ARG2 is NULL, which is what we want.
> > 
> > > +POST(sys_ustat)
> > > +{
> > > +   if (ARG2 != 0) {
> > > +      POST_MEM_WRITE( ARG1, sizeof(struct vki_ustat) );
> > > +   }
> > > +}
> > 
> > You don't need the ARG2 != 0. POST_MEM_WRITE should work on ARG2 (not ARG1).
> 
> In that case, would it make sense to do something like the following?
> 
> $ git diff
> diff --git a/README_MISSING_SYSCALL_OR_IOCTL
> b/README_MISSING_SYSCALL_OR_IOCTL
> index 8ddced5c9..d3aee517e 100644
> --- a/README_MISSING_SYSCALL_OR_IOCTL
> +++ b/README_MISSING_SYSCALL_OR_IOCTL
> @@ -51,16 +51,12 @@ The wrapper for the time system call looks like this:
>       /* time_t time(time_t *t); */
>       PRINT("sys_time ( %p )",ARG1);
>       PRE_REG_READ1(long, "time", int *, t);
> -     if (ARG1 != 0) {
> -        PRE_MEM_WRITE( "time(t)", ARG1, sizeof(vki_time_t) );
> -     }
> +     PRE_MEM_WRITE( "time(t)", ARG1, sizeof(vki_time_t) );
>    }
>  
>    POST(sys_time)
>    {  
> -     if (ARG1 != 0) {
> -        POST_MEM_WRITE( ARG1, sizeof(vki_time_t) );
> -     }
> +     POST_MEM_WRITE( ARG1, sizeof(vki_time_t) );
>    }

Aha, it is the first example in README_MISSING_SYSCALL_OR_IOCTL.
But there it is indeed necessary. 

time_t time(time_t *_Nullable tloc);

If tloc == NULL then it just returns  the  time  as  the  number  of seconds since the Epoch.
But if tloc != NULL then it also stores the time as time_t in the tloc buffer.
 
Maybe we should make it more clear in the README that it depends on whether a pointer argument can be NULL that you should check for NULL before calling PRE_MEM_WRITE or POST_MEM_WRITE? If it isn't support to be NULL then PRE_MEM_WRITE will warn about it.

> > Please use #if defined(VGA_mips64), defined(VGA_mips32) and/or #elif
> > defined(VGA_nanomips)
> > depending on which Valgrind Architecure this holds for.
> > 
> > > +#ifdef __s390x__
> > > +	unsigned int		f_tinode;
> > > +#else
> > > +	unsigned long		f_tinode;
> > > +#endif
> > 
> > Please use #if defined(VGA_s390x)
> > 
> > > +	char			f_fname[6];
> > > +	char			f_fpack[6];
> > > +};
> > > +
> 
> I did that.  For f_tfree, I know for sure it's correct for VGA_mips32 and
> VGA_mips64.  I will  *assume* that the same applies to VGA_nanomips too.

Yes, normally nonamips acts like mips32

> I'm attaching updated patch.  I believe it does address all the points. 
> Please, check.

Yes, looks good. Thanks.
Comment 6 Mark Wielaard 2025-07-17 11:22:55 UTC
commit a4d893c6ef133fdf8255860aa8ca0a8460e105b1
Author: Martin Cermak <mcermak@redhat.com>
Date:   Thu Jul 17 09:16:53 2025 +0200

    Wrap linux specific syscall 22 (ustat)
    
    The ustat syscall comes from pre-git linux history.  It is
    deprecated in favor of statfs.  But in some cases it may
    still be used.
    
    int ustat(dev_t dev, struct ustat *ubuf); returns information
    about a mounted filesystem.  dev is a device number identifying
    a device containing a mounted filesystem.  ubuf is a pointer to
    a ustat structure.
    
    Declare a sys_ustat wrapper in priv_syswrap-linux.h and hook
    it for {amd64,arm,arm64,mips64,nanomips,ppc32,ppc64,riscv64,\
    s390x,x86}-linux using LINXY with PRE and POST handler in
    syswrap-linux.c
    
    https://bugs.kde.org/show_bug.cgi?id=506928