run empty program int main() { return 0;} valgrind --tool=none --sanity-level=3 FAIL Reproducible: Always Steps to Reproduce: 1. echo "int main() {return 0;}" > tst.c 2. gcc tst.c 3. valrind --tool=none --sanity-level=3 ./a.out Actual Results: $ valgrind --tool=none -v --sanity-level=3 ./a.out --17626:0:aspacem segment mismatch: V's seg 1st, kernel's 2nd: --17626:0:aspacem 1: file 0000400000-0000400fff 4096 r-x-- SmFixed d=0x024 i=7181509 o=0 (1) m=0 /home/dimhen/errs/V/a.out --17626:0:aspacem ...: .... 0000400000-0000400fff 4096 r-x.. ....... d=0x01c i=7181509 o=0 (.) m=. /home/dimhen/errs/V/a.out --17626:0:aspacem sync check at m_aspacemgr/aspacemgr-linux.c:1932 (vgPlain_am_get_advisory): FAILED --17626:0:aspacem --17626:0:aspacem Valgrind: FATAL: aspacem assertion failed: --17626:0:aspacem VG_(am_do_sync_check) (__PRETTY_FUNCTION__,__FILE__,__LINE__) --17626:0:aspacem at m_aspacemgr/aspacemgr-linux.c:1932 (vgPlain_am_get_advisory) --17626:0:aspacem Exiting now. Expected Results: no errors Fedora 18, debuginfo installed $ uname -a Linux localhost.localdomain 3.8.3-203.fc18.x86_64 #1 SMP Mon Mar 18 12:59:28 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux $ LANG=C gcc -v Using built-in specs. COLLECT_GCC=/usr/bin/gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.7.2/lto-wrapper Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --disable-build-with-cxx --disable-build-poststage1-with-cxx --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux Thread model: posix gcc version 4.7.2 20121109 (Red Hat 4.7.2-8) (GCC)
Does this only happen when you turn up the sanity level? and if it does then what made you want to do that? It isn't something I would normally expect an end user to change...
Anyway the problem seems to be that valgrind thinks that /home/dimhen/errs/V/a.out is on device 0x024 while /proc/xxx/maps says it is on device 0x01c. Is there anything unusual about the filesystem containing that file? What does "stat /home/dimhen/errs/V/a.out" say?
(In reply to comment #2) > Anyway the problem seems to be that valgrind thinks that > /home/dimhen/errs/V/a.out is on device 0x024 while /proc/xxx/maps says it is > on device 0x01c. > > Is there anything unusual about the filesystem containing that file? $ mount | grep home /dev/sda6 on /home type btrfs (rw,relatime,seclabel,space_cache) > > What does "stat /home/dimhen/errs/V/a.out" say? $ stat ./a.out File: './a.out' Size: 6984 Blocks: 16 IO Block: 4096 regular file Device: 24h/36d Inode: 7181509 Links: 1 Access: (0775/-rwxrwxr-x) Uid: ( 1000/ dimhen) Gid: ( 1000/ dimhen) Context: unconfined_u:object_r:user_home_t:s0 Access: 2013-03-21 15:02:19.649045507 +0400 Modify: 2013-03-21 15:02:16.277933130 +0400 Change: 2013-03-21 15:02:16.277933130 +0400 Birth: -
Aha.... btrfs... I wonder if that has anything to do with it. So stat says 0x24 for the device, which matches what valgrind has recorded, so why is /proc/maps saying something else I wonder. Is /home actually just one sub-volume and /dev/sda6 the device backing the whole volume? What does "ls -l /dev/sda6" show? Can you see any devices in /dev with "0, 25" or "0, 36" as their device number?
(In reply to comment #1) > Does this only happen when you turn up the sanity level? and if it does then > what made you want to do that? It isn't something I would normally expect an > end user to change... Yes. Only with --sanity-level=3. Question and PR arose from tests failures. For me valgrind-trunk 'make regtest' has 5 tests FAIL, 577 PASS none/tests/map_unmap none/tests/sigstackgrowth none/tests/stackgrowth has '--sanity-level=3' and FAIL exp-sgcheck/tests/preen_invars -- look very similiar to PR255603 memcheck/tests/origin5-bz2 -- PR316903
It's quite normal for a few tests to fail, so I wouldn't worry to much about that. If you can answer the questions I asked in comment #4 then we can try and get to the bottom of this specific issue.
(In reply to comment #4) > Aha.... btrfs... I wonder if that has anything to do with it. If i remember correctly another my box with ext4/lvm has the same errs. I'll re-check. > > So stat says 0x24 for the device, which matches what valgrind has recorded, > so why is /proc/maps saying something else I wonder. > > Is /home actually just one sub-volume and /dev/sda6 the device backing the > whole volume? 100Mb fWin hidden area 210 Gb NTFS 524 Mb ext4 /boot extended partition 4 /dev/sda5 4.2 Gb swap /dev/sda6 286Gb btrfs / # mount | grep sda6 /dev/sda6 on / type btrfs (rw,relatime,seclabel,space_cache) /dev/sda6 on /home type btrfs (rw,relatime,seclabel,space_cache) > > What does "ls -l /dev/sda6" show? # ls -l /dev/sda6 brw-rw----. 1 root disk 8, 6 Mar 21 10:57 /dev/sda6 > Can you see any devices in /dev with "0, > 25" or "0, 36" as their device number? No "0, 25", "0, 36" # ls -l /dev | egrep -w '24|25|36' crw-rw-rw-. 1 root tty 5, 2 Mar 21 17:25 ptmx crw--w----. 1 root tty 4, 24 Mar 21 10:57 tty24 crw--w----. 1 root tty 4, 25 Mar 21 10:57 tty25 crw--w----. 1 root tty 4, 36 Mar 21 10:57 tty36
Right, so it looks like you do have at least two btrfs subvolumes on that device, which is almost certainly the root cause of the problem. The device numbers being reported do seem very odd anyway, as they all have a major device number of zero. I rather suspect that btrfs has a stat that returns very dubious values in st_dev that don't reflect the underlying device numbers, probably because it can have multiple (sub)volumes on the save physical device and therefore multiple inode numbering spaces.
Yep - there is a bug in RHBZ describing exactly this problem: https://bugzilla.redhat.com/show_bug.cgi?id=711881
(In reply to comment #7) > (In reply to comment #4) > > Aha.... btrfs... I wonder if that has anything to do with it. > If i remember correctly another my box with ext4/lvm has the same errs. > I'll re-check. With ext4/lvm test PASS. I think that test checks basic functionality and it's bad to skip it So what to do? -- ignore problem segment -- print warning and not exit, add expected stderr.out decrease --sanity-level to 2 is not a variant. Test was added in r3265 for this check (as part of 2.4.0 merge)
(In reply to comment #6) > It's quite normal for a few tests to fail, so I wouldn't worry to much about > that. i hear sometimes somewhere "FAIL free testsuite is Right Thing To Do" :)
Yes obviously it's not ideal that the test suite is not more reliable, but it turns out to be very hard to construct tests for valgrind that reliably pass everywhere - small changes in the operating environment can causes backtraces to change in subtle ways for example. In this case the problem isn't the test at all, it's btrfs invalidating a basic unix assumption about the meaning of st_dev. It may be that we will have to stop comparing device numbers in the sanity check, but certainly the test is not the problem.
*** Bug 486662 has been marked as a duplicate of this bug. ***
I think that the answer is to add something like VG_(do_syscall3)(__NR_statfs, filename, &statfs); if (statfs.f_type == BTRFS_SUPER_MAGIC) { cmp_devino = False; } inside the existing VGO_linux block.
Created attachment 169247 [details] patch, amd64 only Initial version of a fix.
The patch I just added seems to work OK. There is a plan B. In the patch I've also updated the statx structure which (since kernel 5.8) has an stx_mnt_id which we could try to use. I don't know how many people are still using old kernels with btrfs. Not too many I imagine. The 5.8 kernel dates back to Fedora 33. And no guarantee that stx_mnt_id would work.
And with the patch I get a clean slate == 880 tests, 0 stderr failures, 0 stdout failures, 0 stderrB failures, 0 stdoutB failures, 0 post failures ==
I must admit I don't fully understand what we are trying to do. As far as I understand it, the idea is that we sanity check what the aspacemgr records against what /proc/self/maps shows? But that includes some device specific inode values which we don't have/know? Can't we just make cmp_devino = False the default? aka never check those? Or are we loosing some valuable sanity checks/info that way?
(In reply to Mark Wielaard from comment #18) > I must admit I don't fully understand what we are trying to do. As far as I > understand it, the idea is that we sanity check what the aspacemgr records > against what /proc/self/maps shows? But that includes some device specific > inode values which we don't have/know? Can't we just make cmp_devino = False > the default? aka never check those? Or are we loosing some valuable sanity > checks/info that way? I think the goal is to check as many things as possible. Is it valuable? Hard to imagine reading the wrong exe or library that is mmap'd but if ever that did happen it would be fairly catastrophic. The problem with btrfs is that /proc/self/maps uses the device id for the supernode whilst statx returns the device id for the btrfs subvolume. The two are different so the sanity check fails. This check is turned off for Darwin. I'm just testing turning it back on for FreeBSD - it seems to work with a couple of tweaks.
Patch works for me. It is a little unfortunate that vki_statfs is arch specific. uapi/asm-generic/statfs.h says: /* * Most 64-bit platforms use 'long', while most 32-bit platforms use '__u32'. * Yes, they differ in signedness as well as size. * Special cases can override it for themselves -- except for S390x, which * is just a little too special for us. And MIPS, which I'm not touching * with a 10' pole. */ But I think we can make it linux-generic even without updating vki_statfs everywhere. We don't seem to use the new f_flags anywhere. We just need to check f_type.
Could we make it linux generic by just using __NR_statfs and struct statfs (which seems defined for each linux arch already): diff --git a/coregrind/m_aspacemgr/aspacemgr-linux.c b/coregrind/m_aspacemgr/aspacemgr-linux.c index 9362f65af3c8..3da68db381e1 100644 --- a/coregrind/m_aspacemgr/aspacemgr-linux.c +++ b/coregrind/m_aspacemgr/aspacemgr-linux.c @@ -891,6 +891,14 @@ static void sync_check_mapping_callback ( Addr addr, SizeT len, UInt prot, /* hack apparently needed on MontaVista Linux */ if (filename && VG_(strstr)(filename, "/.lib-ro/")) cmp_devino = False; + +#if defined(VGO_linux) + struct vki_statfs statfs = {0}; + SysRes res = VG_(do_syscall2)(__NR_statfs, (UWord)filename, (UWord)&statfs); + if (!sr_isError(res) && statfs.f_type == VKI_BTRFS_SUPER_MAGIC) { + cmp_devino = False; + } +#endif #endif /* If we are doing sloppy execute permission checks then we diff --git a/include/vki/vki-linux.h b/include/vki/vki-linux.h index be3d76690cee..708a12b64d03 100644 --- a/include/vki/vki-linux.h +++ b/include/vki/vki-linux.h @@ -5455,6 +5460,12 @@ struct vki_open_how { #define VKI_CLOSE_RANGE_UNSHARE (1U << 1) #define VKI_CLOSE_RANGE_CLOEXEC (1U << 2) +//---------------------------------------------------------------------- +// From linux/magic.h +//---------------------------------------------------------------------- + +#define VKI_BTRFS_SUPER_MAGIC 0x9123683E + /*--------------------------------------------------------------------*/ /*--- end ---*/ /*--------------------------------------------------------------------*/
Created attachment 171372 [details] Avoid dev/inode check on btrfs with --sanity-level=3 With --sanity-level=3 or higher the aspacemgr sanity checks the device/inode numbers from /proc/self/maps to the file stat results. These don't match on btrfs. So detect when a file is on a btrfs volume and ignore the check in that case. On Linux we can use the statfs call for that, except on nanomips (which also doesn't have a sys_fstatfs syswrap). Tested on (linux) i686, x86_64, aarch64, ppc64le and s390x.
commit 3b06d458ffc5cc8de8d701926e5d86979185fa04 (HEAD -> master, origin/master, origin/HEAD) Author: Mark Wielaard <mark@klomp.org> Date: Thu Jul 4 15:21:39 2024 +0200 Avoid dev/inode check on btrfs with --sanity-level=3 With --sanity-level=3 or higher the aspacemgr sanity checks the device/inode numbers from /proc/self/maps to the file stat results. These don't match on btrfs. So detect when a file is on a btrfs volume and ignore the check in that case. https://bugs.kde.org/show_bug.cgi?id=317127