164669 – SVN regression: mmap() drops posix file locks

Bug 164669 - SVN regression: mmap() drops posix file locks

Summary: SVN regression: mmap() drops posix file locks

Status:	RESOLVED FIXED

Alias:	None

Product:	valgrind
Classification:	Developer tools
Component:	general (show other bugs)
Version:	3.4 SVN
Platform:	Compiled Sources Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	Julian Seward

URL:
Keywords:

Depends on:
Blocks:

Reported:	2008-06-22 15:46 UTC by Alan Jenkins
Modified:	2008-11-08 16:23 UTC (History)
CC List:	0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Attachments
Minimal test program to reproduce problem (837 bytes, text/plain) 2008-06-23 09:44 UTC, Alan Jenkins	Details
Updated version of file locking test case (2.04 KB, text/plain) 2008-06-23 21:18 UTC, Alan Jenkins	Details
Show Obsolete (1) View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Alan Jenkins 2008-06-22 15:46:08 UTC

Program:

module-init-tools (linux kernel module tools including "insmod" and "modprobe") 
has an automated test suite which is supposed to run under valgrind (when 
available).  The current sources are broken - they don't actually use valgrind, 
so I'm fixing this as part of my work on it.

Symptom:

With my fixes applied, the testsuite passes using valgrind version 3.3.1.  
However it fails under SVN r8262.  I tested this specific revision as it is 
just before the recent change which adds more support for module syscalls to 
valgrind.

The testsuite is run twice to cover both building with and without zlib.  The 
specific test failure is the second instance 
of "tests/test-modprobe/25locking.sh".

Root cause:

I tried describing the problem in mind-numbing detail to report it, which 
pushed me to investigate further and eventually isolate the cause.  My problem 
is consistent with fcntl() allowing two processes to lock a file at once.

SYSCALL[10760,1](  2) sys_open ( 0x7FF000DD8
(tests/data/32/normal/export_nodep-32.ko), 2 ) --> [async] ... 
SYSCALL[10760,1](  2) ... [async] --> Success(0x5)
SYSCALL[10760,1]( 72) sys_fcntl[ARG3=='lock'] ( 5, 7, 0x7FEFFF9C0 ) --> 
[async] ... 
SYSCALL[10760,1]( 72) ... [async] --> Success(0x0)

... (the first modprobe then loops thinking it has the lock held, but a second 
modprobe is able to take the lock) ...

SYSCALL[10762,1](  2) sys_open ( 0x7FF000DD8
(tests/data/32/normal/export_nodep-32.ko), 2 ) --> [async] ... 
SYSCALL[10762,1](  2) ... [async] --> Success(0x5)
SYSCALL[10762,1]( 72) sys_fcntl[ARG3=='lock'] ( 5, 7, 0x7FEFFF9C0 ) --> 
[async] ... 
SYSCALL[10762,1]( 72) ... [async] --> Success(0x0)

The cause is most likely due to the brokenness of POSIX advisory file lock 
semantics.  If a process closes *any* file descriptor referring to a locked 
file, it loses the lock.  Doesn't matter if you cloned the fd using dup().  
Doesn't matter if you obtained it by calling open() a second time on the same 
pathname.  Once you close a file, you lose all locks on it.  That's what the 
manpage for fcntl says.

I ran valgrind under "strace" and it did indeed look like this was problem.

When modprobe is built *without* zlib support, it uses mmap() to open modules.  
Valgrind does something a bit funny in response to this.  It reopens the file, 
reads it - and then closes it.  Bam, we lose the file lock.

mmap(0x4020000, 1132, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 5, 0) = 
0x4020000
fstat(5, {st_mode=S_IFREG|0644, st_size=1132, ...}) = 0
readlink("/proc/self/fd/5", "/home/alan/src/bootup/module-init-tools.upstream/tests/data/32-le/normal/export_nodep-32.ko", 
4096) = 91
stat("/home/alan/src/bootup/module-init-tools.upstream/tests/data/32-le/normal/export_nodep-32.ko", 
{st_mode=S_IFREG|0644, st_size=1132, ...}) = 0
open("/home/alan/src/bootup/module-init-tools.upstream/tests/data/32-le/normal/export_nodep-32.ko", 
O_RDONLY) = 12
read(12, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\1\0\3\0\1\0\0\0\0\0\0\0"..., 1024) = 
1024
close(12)                               = 0
fstat(1, {st_mode=S_IFREG|0644, st_size=106111, ...}) = 0
mmap(0x4021000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_ANONYMOUS, -1, 0) = 0x4021000
gettid()                                = 10774
write(1018, "W", 1)                     = 1
rt_sigprocmask(SIG_SETMASK, [], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
write(1, "Looping on tests/tmp/continue\n", 30Looping on tests/tmp/continue
) = 30

Comment 1 Bart Van Assche 2008-06-23 09:09:36 UTC

It would be great if you could provide a small test program calling fcntl(..., SETLK, ...) that illustrates the difference in behavior between the 3.3 branch and the trunk.

Comment 2 Alan Jenkins 2008-06-23 09:44:01 UTC

Created attachment 25533 [details]
Minimal test program to reproduce problem

Sure thing.

Here's a test program for minimal reproduction which does fcntl(fd, SETLK,
...), then mmap(), then loops.	Run one in the background, then a second in the
foreground on the same file.  On 3.3.1, the second one fails to lock the file
and exits.  On SVN, the second one succeeds in acquiring the lock and moves on
to the looping stage.

Comment 3 Bart Van Assche 2008-06-23 10:30:29 UTC

Your test program is buggy: if (! ... open(...)) should be if (... open(...) < 0).

Comment 4 Alan Jenkins 2008-06-23 10:35:51 UTC

Thanks!  It does show the problem though, right?

By the way, this doesn't reflect on the module-init-tools code.  I hacked the test case up from scratch so any bugs in it are all my fault.

Comment 5 Bart Van Assche 2008-06-23 13:48:04 UTC

Sorry, but I'm still not convinced that the attached test program is fine. The attached test program calls open(..., ...|O_CREAT) from two different processes. How can this succeed ?

By this time I have added a regression test to memcheck for file locking (memcheck/tests/file_locking.{c,vgtest,stderr.exp}). The output for this test program is the same when run natively or when run under memcheck. You are welcome to review the source code of the test program and the expected output.

Comment 6 Alan Jenkins 2008-06-23 14:45:37 UTC

> O_CREAT 
>  If the file does not exist it will be created.

... and if the file already exists, it will be opened normally.  You're thinking of O_CREAT|O_EXCL.

Even if my test case is broken, it should be interesting that it displays two different behaviors on valgrind 3.3.1 and SVN trunk.  If you try running it and observe this behavior it looks very suspicious.  That might have helped you detect the following issues: 

You forgot to call mmap() in your test, which is what this bug is all about.

There is a bug in both of our test cases, which stops yours from working properly.  Look at the expected output file for your testcase; it does *not* print "Second locking attempt failed".  Instead, it prints "second open call: permission denied".

When I wrote my test I didn't realise that O_CREAT requires a third argument to open() to specify the file "mode" i.e. permissions.  Without it you get garbage permissions which just happen to deny the program read and write access to the file on the second open.

Finally, it seems that an individual process is allowed to acquire multiple exclusive locks on the same file region.  Therefore if I fix the other issues and run the test case normally (not under valgrind) I get "ERROR: second lock attempt succeeded !".  I think the test case should use fork() to spin off a child process which tries to make the second lock.

If you don't mind waiting a few days then I can submit a patch to address these issues and give you a test case which fails when run under the current valgrind trunk.

Comment 7 Alan Jenkins 2008-06-23 21:18:05 UTC

Created attachment 25547 [details]
Updated version of file locking test case

I see you fixed the testcase open(... O_CREAT) issue.  Here's a fix for the
remaining two.	It adds mmap() after the first lock (which is what triggers the
bug) and a fork() before the second lock.

Expected behaviour, as seen when run normally:

$ ./file_locking
About to lock file for writing.
mapping file into memory for writing.
First locking attempt succeeded.
Second locking attempt failed.
Test finished successfully.

Behaviour under valgrind SVN trunk:

$ valgrind -q ./file_locking
About to lock file for writing.
mapping file into memory for writing.
First locking attempt succeeded.
ERROR: second lock attempt succeeded !
Test finished successfully.

I hope that makes this bug clear.

Regards
Alan

Comment 8 Bart Van Assche 2008-06-24 10:31:23 UTC

By this time I have updated the memcheck/tests/file_locking regression test. And as you mentioned before, Valgrind 3.3.1 runs the test correctly but 3.4.0.SVN (trunk) not.

Comment 9 Bart Van Assche 2008-06-25 10:21:05 UTC

Apparently this issue was introduced in r7540 (the merge of the DATASYMS branch to the trunk).

Comment 10 Julian Seward 2008-11-05 01:32:53 UTC

Alan, Bart, thanks for chasing this into a small repro case.

This is a particularly nasty problem, and the fact that your
app works on 3.3.x is more by accident than design.  Let me
explain ..

Valgrind needs to decide when the debug info for a .so should
be read.  Suppose a .so is dlopened some time after the program
has started (not your program, any program, in general).  V needs
to read the debug info for the .so, and it decides when to do 
this by watching the sequence of mmaps presented to the kernel.
This is because ld.so will basically mmap any object twice in
response to a dlopen, once r-x and once rw-, so that its text
and data segment are mapped in.

When Valgrind sees files being mapped in, as your test case
does, it sticks its nose in.  Under some circumstances it will
read the first 1K of the file itself, to see if it has a plausible
looking ELF header.  If it does, it may go on to read debuginfo
from the file.  This is what you observed with your strace above,
and it is also what is screwing up the advisory locks.

Purely by chance, at least for this example, 3.3.x never gets
around to doing that.  That's because you map the file -w- afaics,
and 3.3.x ignores that -- it's looking out for rw- and r-x mappings.

But the order of tests changed in the trunk (this logic was completely
rewritten).  The trunk inspects the file's header first, and only
checks the mapping perms (-w-, etc) second.  Which is why it fails
on the trunk.

I can easily enough make the trunk succeed, by swapping the order
of tests back to be as in 3.3.x.  That makes your test case pass.

But in general both 3.3.x and trunk will both still screw up if you
map the file r-x, because then they will say, hmm, that's a plausible
mapping permission, we better go on and inspect the header.

Anyway, FWIW the patch against trunk is below.  It looks big but
it merely swaps the two tests around.  Can you check if it works
for the original problem case you had?

FWIW, I had no idea about "POSIX advisory file locking" until I saw
this bug.  Sigh -- yet another way Valgrind's simulation is imperfect.

Comment 11 Julian Seward 2008-11-05 01:36:05 UTC

Possible fix, diff against r8729:

Index: coregrind/m_debuginfo/debuginfo.c
===================================================================
--- coregrind/m_debuginfo/debuginfo.c   (revision 8729)
+++ coregrind/m_debuginfo/debuginfo.c   (working copy)
@@ -664,38 +664,6 @@

    /* no uses of statbuf below here. */

-   /* Peer at the first few bytes of the file, to see if it is an ELF */
-   /* object file. Ignore the file if we do not have read permission. */
-   VG_(memset)(buf1k, 0, sizeof(buf1k));
-   fd = VG_(open)( filename, VKI_O_RDONLY, 0 );
-   if (fd.isError) {
-      if (fd.err != VKI_EACCES)
-      {
-         DebugInfo fake_di;
-         VG_(memset)(&fake_di, 0, sizeof(fake_di));
-         fake_di.filename = filename;
-         ML_(symerr)(&fake_di, True, "can't open file to inspect ELF header");
-      }
-      return 0;
-   }
-   nread = VG_(read)( fd.res, buf1k, sizeof(buf1k) );
-   VG_(close)( fd.res );
-
-   if (nread == 0)
-      return 0;
-   if (nread < 0) {
-      DebugInfo fake_di;
-      VG_(memset)(&fake_di, 0, sizeof(fake_di));
-      fake_di.filename = filename;
-      ML_(symerr)(&fake_di, True, "can't read file to inspect ELF header");
-      return 0;
-   }
-   vg_assert(nread > 0 && nread <= sizeof(buf1k) );
-
-   /* We're only interested in mappings of ELF object files. */
-   if (!ML_(is_elf_object_file)( buf1k, (SizeT)nread ))
-      return 0;
-
    /* Now we have to guess if this is a text-like mapping, a data-like
       mapping, neither or both.  The rules are:

@@ -751,6 +719,38 @@
    if (!(is_rx_map || is_rw_map))
       return 0;

+   /* Peer at the first few bytes of the file, to see if it is an ELF */
+   /* object file. Ignore the file if we do not have read permission. */
+   VG_(memset)(buf1k, 0, sizeof(buf1k));
+   fd = VG_(open)( filename, VKI_O_RDONLY, 0 );
+   if (fd.isError) {
+      if (fd.err != VKI_EACCES)
+      {
+         DebugInfo fake_di;
+         VG_(memset)(&fake_di, 0, sizeof(fake_di));
+         fake_di.filename = filename;
+         ML_(symerr)(&fake_di, True, "can't open file to inspect ELF header");
+      }
+      return 0;
+   }
+   nread = VG_(read)( fd.res, buf1k, sizeof(buf1k) );
+   VG_(close)( fd.res );
+
+   if (nread == 0)
+      return 0;
+   if (nread < 0) {
+      DebugInfo fake_di;
+      VG_(memset)(&fake_di, 0, sizeof(fake_di));
+      fake_di.filename = filename;
+      ML_(symerr)(&fake_di, True, "can't read file to inspect ELF header");
+      return 0;
+   }
+   vg_assert(nread > 0 && nread <= sizeof(buf1k) );
+
+   /* We're only interested in mappings of ELF object files. */
+   if (!ML_(is_elf_object_file)( buf1k, (SizeT)nread ))
+      return 0;
+
    /* See if we have a DebugInfo for this filename.  If not,
       create one. */
    di = find_or_create_DebugInfo_for( filename, NULL/*membername*/ );

Comment 12 Julian Seward 2008-11-08 16:23:19 UTC

Committed as r8747; closing; assumed fixed.