Bug 317127 - Fedora18/x86_64 --sanity-level=3 : aspacem segment mismatch
Summary: Fedora18/x86_64 --sanity-level=3 : aspacem segment mismatch
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.9.0.SVN
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-03-21 11:39 UTC by Dmitry Djachenko
Modified: 2013-03-21 19:03 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry Djachenko 2013-03-21 11:39:44 UTC
run empty program
int main() { return 0;}
valgrind --tool=none --sanity-level=3 FAIL

Reproducible: Always

Steps to Reproduce:
1. echo "int main() {return 0;}" > tst.c
2. gcc tst.c
3. valrind --tool=none --sanity-level=3 ./a.out
Actual Results:  
$ valgrind --tool=none -v --sanity-level=3 ./a.out 
--17626:0:aspacem  segment mismatch: V's seg 1st, kernel's 2nd:
--17626:0:aspacem    1: file 0000400000-0000400fff    4096 r-x-- SmFixed d=0x024 i=7181509 o=0       (1) m=0 /home/dimhen/errs/V/a.out
--17626:0:aspacem  ...: .... 0000400000-0000400fff    4096 r-x.. ....... d=0x01c i=7181509 o=0       (.) m=. /home/dimhen/errs/V/a.out
--17626:0:aspacem  sync check at m_aspacemgr/aspacemgr-linux.c:1932 (vgPlain_am_get_advisory): FAILED
--17626:0:aspacem  
--17626:0:aspacem  Valgrind: FATAL: aspacem assertion failed:
--17626:0:aspacem    VG_(am_do_sync_check) (__PRETTY_FUNCTION__,__FILE__,__LINE__)
--17626:0:aspacem    at m_aspacemgr/aspacemgr-linux.c:1932 (vgPlain_am_get_advisory)
--17626:0:aspacem  Exiting now.



Expected Results:  
no errors

Fedora 18, debuginfo installed
$ uname -a
Linux localhost.localdomain 3.8.3-203.fc18.x86_64 #1 SMP Mon Mar 18 12:59:28 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
$ LANG=C gcc -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.7.2/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --disable-build-with-cxx --disable-build-poststage1-with-cxx --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.7.2 20121109 (Red Hat 4.7.2-8) (GCC)
Comment 1 Tom Hughes 2013-03-21 11:46:44 UTC
Does this only happen when you turn up the sanity level? and if it does then what made you want to do that? It isn't something I would normally expect an end user to change...
Comment 2 Tom Hughes 2013-03-21 11:53:21 UTC
Anyway the problem seems to be that valgrind thinks that /home/dimhen/errs/V/a.out is on device 0x024 while /proc/xxx/maps says it is on device 0x01c.

Is there anything unusual about the filesystem containing that file?

What does "stat  /home/dimhen/errs/V/a.out" say?
Comment 3 Dmitry Djachenko 2013-03-21 12:28:45 UTC
(In reply to comment #2)
> Anyway the problem seems to be that valgrind thinks that
> /home/dimhen/errs/V/a.out is on device 0x024 while /proc/xxx/maps says it is
> on device 0x01c.
> 
> Is there anything unusual about the filesystem containing that file?
$ mount | grep home
/dev/sda6 on /home type btrfs (rw,relatime,seclabel,space_cache)

> 
> What does "stat  /home/dimhen/errs/V/a.out" say?

$ stat ./a.out 
  File: './a.out'
  Size: 6984      	Blocks: 16         IO Block: 4096   regular file
Device: 24h/36d	Inode: 7181509     Links: 1
Access: (0775/-rwxrwxr-x)  Uid: ( 1000/  dimhen)   Gid: ( 1000/  dimhen)
Context: unconfined_u:object_r:user_home_t:s0
Access: 2013-03-21 15:02:19.649045507 +0400
Modify: 2013-03-21 15:02:16.277933130 +0400
Change: 2013-03-21 15:02:16.277933130 +0400
 Birth: -
Comment 4 Tom Hughes 2013-03-21 12:39:04 UTC
Aha.... btrfs... I wonder if that has anything to do with it.

So stat says 0x24 for the device, which matches what valgrind has recorded, so why is /proc/maps saying something else I wonder.

Is /home actually just one sub-volume and /dev/sda6 the device backing the whole volume?

What does "ls -l /dev/sda6" show? Can you see any devices in /dev with "0, 25" or "0, 36" as their device number?
Comment 5 Dmitry Djachenko 2013-03-21 13:11:48 UTC
(In reply to comment #1)
> Does this only happen when you turn up the sanity level? and if it does then
> what made you want to do that? It isn't something I would normally expect an
> end user to change...

Yes. Only with --sanity-level=3.
Question and PR arose from tests failures.

For me valgrind-trunk 'make regtest' has 5 tests FAIL, 577 PASS

none/tests/map_unmap
none/tests/sigstackgrowth
none/tests/stackgrowth
has '--sanity-level=3' and FAIL

exp-sgcheck/tests/preen_invars -- look very similiar to PR255603
memcheck/tests/origin5-bz2 -- PR316903
Comment 6 Tom Hughes 2013-03-21 13:24:32 UTC
It's quite normal for a few tests to fail, so I wouldn't worry to much about that. If you can answer the questions I asked in comment #4 then we can try and get to the bottom of this specific issue.
Comment 7 Dmitry Djachenko 2013-03-21 13:26:07 UTC
(In reply to comment #4)
> Aha.... btrfs... I wonder if that has anything to do with it.
If i remember correctly another my box with ext4/lvm has the same errs.
I'll re-check.

> 
> So stat says 0x24 for the device, which matches what valgrind has recorded,
> so why is /proc/maps saying something else I wonder.
> 
> Is /home actually just one sub-volume and /dev/sda6 the device backing the
> whole volume?
100Mb fWin hidden area
210 Gb NTFS
524 Mb ext4 /boot
extended partition 4
   /dev/sda5 4.2 Gb swap
   /dev/sda6 286Gb btrfs /

# mount | grep sda6
/dev/sda6 on / type btrfs (rw,relatime,seclabel,space_cache)
/dev/sda6 on /home type btrfs (rw,relatime,seclabel,space_cache)


> 
> What does "ls -l /dev/sda6" show?
# ls -l /dev/sda6
brw-rw----. 1 root disk 8, 6 Mar 21 10:57 /dev/sda6

> Can you see any devices in /dev with "0,
> 25" or "0, 36" as their device number?
No "0, 25", "0, 36"
# ls -l /dev | egrep -w '24|25|36'
crw-rw-rw-. 1 root tty       5,   2 Mar 21 17:25 ptmx
crw--w----. 1 root tty       4,  24 Mar 21 10:57 tty24
crw--w----. 1 root tty       4,  25 Mar 21 10:57 tty25
crw--w----. 1 root tty       4,  36 Mar 21 10:57 tty36
Comment 8 Tom Hughes 2013-03-21 13:31:34 UTC
Right, so it looks like you do have at least two btrfs subvolumes on that device, which is almost certainly the root cause of the problem.

The device numbers being reported do seem very odd anyway, as they all have a major device number of zero. I rather suspect that btrfs has a stat that returns very dubious values in st_dev that don't reflect the underlying device numbers, probably because it can have multiple (sub)volumes on the save physical device and therefore multiple inode numbering spaces.
Comment 9 Tom Hughes 2013-03-21 13:32:49 UTC
Yep - there is a bug in RHBZ describing exactly this problem: https://bugzilla.redhat.com/show_bug.cgi?id=711881
Comment 10 Dmitry Djachenko 2013-03-21 18:31:37 UTC
(In reply to comment #7)
> (In reply to comment #4)
> > Aha.... btrfs... I wonder if that has anything to do with it.
> If i remember correctly another my box with ext4/lvm has the same errs.
> I'll re-check.
With ext4/lvm test PASS.

I think that test checks basic functionality and it's bad to skip it

So what to do?
-- ignore problem segment
-- print warning and not exit, add expected stderr.out

decrease --sanity-level to 2 is not a variant. Test was added in r3265 for this check (as part of 2.4.0 merge)
Comment 11 Dmitry Djachenko 2013-03-21 18:52:06 UTC
(In reply to comment #6)
> It's quite normal for a few tests to fail, so I wouldn't worry to much about
> that.
i hear sometimes somewhere "FAIL free testsuite is Right Thing To Do" :)
Comment 12 Tom Hughes 2013-03-21 19:03:11 UTC
Yes obviously it's not ideal that the test suite is not more reliable, but it turns out to be very hard to construct tests for valgrind that reliably pass everywhere - small changes in the operating environment can causes backtraces to change in subtle ways for example.

In this case the problem isn't the test at all, it's btrfs invalidating a basic unix assumption about the meaning of st_dev.

It may be that we will have to stop comparing device numbers in the sanity check, but certainly the test is not the problem.