Program: module-init-tools (linux kernel module tools including "insmod" and "modprobe") has an automated test suite which is supposed to run under valgrind (when available). The current sources are broken - they don't actually use valgrind, so I'm fixing this as part of my work on it. Symptom: With my fixes applied, the testsuite passes using valgrind version 3.3.1. However it fails under SVN r8262. I tested this specific revision as it is just before the recent change which adds more support for module syscalls to valgrind. The testsuite is run twice to cover both building with and without zlib. The specific test failure is the second instance of "tests/test-modprobe/25locking.sh". Root cause: I tried describing the problem in mind-numbing detail to report it, which pushed me to investigate further and eventually isolate the cause. My problem is consistent with fcntl() allowing two processes to lock a file at once. SYSCALL[10760,1]( 2) sys_open ( 0x7FF000DD8 (tests/data/32/normal/export_nodep-32.ko), 2 ) --> [async] ... SYSCALL[10760,1]( 2) ... [async] --> Success(0x5) SYSCALL[10760,1]( 72) sys_fcntl[ARG3=='lock'] ( 5, 7, 0x7FEFFF9C0 ) --> [async] ... SYSCALL[10760,1]( 72) ... [async] --> Success(0x0) ... (the first modprobe then loops thinking it has the lock held, but a second modprobe is able to take the lock) ... SYSCALL[10762,1]( 2) sys_open ( 0x7FF000DD8 (tests/data/32/normal/export_nodep-32.ko), 2 ) --> [async] ... SYSCALL[10762,1]( 2) ... [async] --> Success(0x5) SYSCALL[10762,1]( 72) sys_fcntl[ARG3=='lock'] ( 5, 7, 0x7FEFFF9C0 ) --> [async] ... SYSCALL[10762,1]( 72) ... [async] --> Success(0x0) The cause is most likely due to the brokenness of POSIX advisory file lock semantics. If a process closes *any* file descriptor referring to a locked file, it loses the lock. Doesn't matter if you cloned the fd using dup(). Doesn't matter if you obtained it by calling open() a second time on the same pathname. Once you close a file, you lose all locks on it. That's what the manpage for fcntl says. I ran valgrind under "strace" and it did indeed look like this was problem. When modprobe is built *without* zlib support, it uses mmap() to open modules. Valgrind does something a bit funny in response to this. It reopens the file, reads it - and then closes it. Bam, we lose the file lock. mmap(0x4020000, 1132, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 5, 0) = 0x4020000 fstat(5, {st_mode=S_IFREG|0644, st_size=1132, ...}) = 0 readlink("/proc/self/fd/5", "/home/alan/src/bootup/module-init-tools.upstream/tests/data/32-le/normal/export_nodep-32.ko", 4096) = 91 stat("/home/alan/src/bootup/module-init-tools.upstream/tests/data/32-le/normal/export_nodep-32.ko", {st_mode=S_IFREG|0644, st_size=1132, ...}) = 0 open("/home/alan/src/bootup/module-init-tools.upstream/tests/data/32-le/normal/export_nodep-32.ko", O_RDONLY) = 12 read(12, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\1\0\3\0\1\0\0\0\0\0\0\0"..., 1024) = 1024 close(12) = 0 fstat(1, {st_mode=S_IFREG|0644, st_size=106111, ...}) = 0 mmap(0x4021000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED| MAP_ANONYMOUS, -1, 0) = 0x4021000 gettid() = 10774 write(1018, "W", 1) = 1 rt_sigprocmask(SIG_SETMASK, [], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0 write(1, "Looping on tests/tmp/continue\n", 30Looping on tests/tmp/continue ) = 30
It would be great if you could provide a small test program calling fcntl(..., SETLK, ...) that illustrates the difference in behavior between the 3.3 branch and the trunk.
Created attachment 25533 [details] Minimal test program to reproduce problem Sure thing. Here's a test program for minimal reproduction which does fcntl(fd, SETLK, ...), then mmap(), then loops. Run one in the background, then a second in the foreground on the same file. On 3.3.1, the second one fails to lock the file and exits. On SVN, the second one succeeds in acquiring the lock and moves on to the looping stage.
Your test program is buggy: if (! ... open(...)) should be if (... open(...) < 0).
Thanks! It does show the problem though, right? By the way, this doesn't reflect on the module-init-tools code. I hacked the test case up from scratch so any bugs in it are all my fault.
Sorry, but I'm still not convinced that the attached test program is fine. The attached test program calls open(..., ...|O_CREAT) from two different processes. How can this succeed ? By this time I have added a regression test to memcheck for file locking (memcheck/tests/file_locking.{c,vgtest,stderr.exp}). The output for this test program is the same when run natively or when run under memcheck. You are welcome to review the source code of the test program and the expected output.
> O_CREAT > If the file does not exist it will be created. ... and if the file already exists, it will be opened normally. You're thinking of O_CREAT|O_EXCL. Even if my test case is broken, it should be interesting that it displays two different behaviors on valgrind 3.3.1 and SVN trunk. If you try running it and observe this behavior it looks very suspicious. That might have helped you detect the following issues: You forgot to call mmap() in your test, which is what this bug is all about. There is a bug in both of our test cases, which stops yours from working properly. Look at the expected output file for your testcase; it does *not* print "Second locking attempt failed". Instead, it prints "second open call: permission denied". When I wrote my test I didn't realise that O_CREAT requires a third argument to open() to specify the file "mode" i.e. permissions. Without it you get garbage permissions which just happen to deny the program read and write access to the file on the second open. Finally, it seems that an individual process is allowed to acquire multiple exclusive locks on the same file region. Therefore if I fix the other issues and run the test case normally (not under valgrind) I get "ERROR: second lock attempt succeeded !". I think the test case should use fork() to spin off a child process which tries to make the second lock. If you don't mind waiting a few days then I can submit a patch to address these issues and give you a test case which fails when run under the current valgrind trunk.
Created attachment 25547 [details] Updated version of file locking test case I see you fixed the testcase open(... O_CREAT) issue. Here's a fix for the remaining two. It adds mmap() after the first lock (which is what triggers the bug) and a fork() before the second lock. Expected behaviour, as seen when run normally: $ ./file_locking About to lock file for writing. mapping file into memory for writing. First locking attempt succeeded. Second locking attempt failed. Test finished successfully. Behaviour under valgrind SVN trunk: $ valgrind -q ./file_locking About to lock file for writing. mapping file into memory for writing. First locking attempt succeeded. ERROR: second lock attempt succeeded ! Test finished successfully. I hope that makes this bug clear. Regards Alan
By this time I have updated the memcheck/tests/file_locking regression test. And as you mentioned before, Valgrind 3.3.1 runs the test correctly but 3.4.0.SVN (trunk) not.
Apparently this issue was introduced in r7540 (the merge of the DATASYMS branch to the trunk).
Alan, Bart, thanks for chasing this into a small repro case. This is a particularly nasty problem, and the fact that your app works on 3.3.x is more by accident than design. Let me explain .. Valgrind needs to decide when the debug info for a .so should be read. Suppose a .so is dlopened some time after the program has started (not your program, any program, in general). V needs to read the debug info for the .so, and it decides when to do this by watching the sequence of mmaps presented to the kernel. This is because ld.so will basically mmap any object twice in response to a dlopen, once r-x and once rw-, so that its text and data segment are mapped in. When Valgrind sees files being mapped in, as your test case does, it sticks its nose in. Under some circumstances it will read the first 1K of the file itself, to see if it has a plausible looking ELF header. If it does, it may go on to read debuginfo from the file. This is what you observed with your strace above, and it is also what is screwing up the advisory locks. Purely by chance, at least for this example, 3.3.x never gets around to doing that. That's because you map the file -w- afaics, and 3.3.x ignores that -- it's looking out for rw- and r-x mappings. But the order of tests changed in the trunk (this logic was completely rewritten). The trunk inspects the file's header first, and only checks the mapping perms (-w-, etc) second. Which is why it fails on the trunk. I can easily enough make the trunk succeed, by swapping the order of tests back to be as in 3.3.x. That makes your test case pass. But in general both 3.3.x and trunk will both still screw up if you map the file r-x, because then they will say, hmm, that's a plausible mapping permission, we better go on and inspect the header. Anyway, FWIW the patch against trunk is below. It looks big but it merely swaps the two tests around. Can you check if it works for the original problem case you had? FWIW, I had no idea about "POSIX advisory file locking" until I saw this bug. Sigh -- yet another way Valgrind's simulation is imperfect.
Possible fix, diff against r8729: Index: coregrind/m_debuginfo/debuginfo.c =================================================================== --- coregrind/m_debuginfo/debuginfo.c (revision 8729) +++ coregrind/m_debuginfo/debuginfo.c (working copy) @@ -664,38 +664,6 @@ /* no uses of statbuf below here. */ - /* Peer at the first few bytes of the file, to see if it is an ELF */ - /* object file. Ignore the file if we do not have read permission. */ - VG_(memset)(buf1k, 0, sizeof(buf1k)); - fd = VG_(open)( filename, VKI_O_RDONLY, 0 ); - if (fd.isError) { - if (fd.err != VKI_EACCES) - { - DebugInfo fake_di; - VG_(memset)(&fake_di, 0, sizeof(fake_di)); - fake_di.filename = filename; - ML_(symerr)(&fake_di, True, "can't open file to inspect ELF header"); - } - return 0; - } - nread = VG_(read)( fd.res, buf1k, sizeof(buf1k) ); - VG_(close)( fd.res ); - - if (nread == 0) - return 0; - if (nread < 0) { - DebugInfo fake_di; - VG_(memset)(&fake_di, 0, sizeof(fake_di)); - fake_di.filename = filename; - ML_(symerr)(&fake_di, True, "can't read file to inspect ELF header"); - return 0; - } - vg_assert(nread > 0 && nread <= sizeof(buf1k) ); - - /* We're only interested in mappings of ELF object files. */ - if (!ML_(is_elf_object_file)( buf1k, (SizeT)nread )) - return 0; - /* Now we have to guess if this is a text-like mapping, a data-like mapping, neither or both. The rules are: @@ -751,6 +719,38 @@ if (!(is_rx_map || is_rw_map)) return 0; + /* Peer at the first few bytes of the file, to see if it is an ELF */ + /* object file. Ignore the file if we do not have read permission. */ + VG_(memset)(buf1k, 0, sizeof(buf1k)); + fd = VG_(open)( filename, VKI_O_RDONLY, 0 ); + if (fd.isError) { + if (fd.err != VKI_EACCES) + { + DebugInfo fake_di; + VG_(memset)(&fake_di, 0, sizeof(fake_di)); + fake_di.filename = filename; + ML_(symerr)(&fake_di, True, "can't open file to inspect ELF header"); + } + return 0; + } + nread = VG_(read)( fd.res, buf1k, sizeof(buf1k) ); + VG_(close)( fd.res ); + + if (nread == 0) + return 0; + if (nread < 0) { + DebugInfo fake_di; + VG_(memset)(&fake_di, 0, sizeof(fake_di)); + fake_di.filename = filename; + ML_(symerr)(&fake_di, True, "can't read file to inspect ELF header"); + return 0; + } + vg_assert(nread > 0 && nread <= sizeof(buf1k) ); + + /* We're only interested in mappings of ELF object files. */ + if (!ML_(is_elf_object_file)( buf1k, (SizeT)nread )) + return 0; + /* See if we have a DebugInfo for this filename. If not, create one. */ di = find_or_create_DebugInfo_for( filename, NULL/*membername*/ );
Committed as r8747; closing; assumed fixed.