Bug 361810 - valgrind duplicate stdin after fork
Summary: valgrind duplicate stdin after fork
Status: CONFIRMED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.11.0
Platform: Fedora RPMs Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-15 13:18 UTC by Richard Jones
Modified: 2017-05-05 14:41 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
forktest.c (1.58 KB, text/plain)
2016-04-15 13:18 UTC, Richard Jones
Details
forktest.sh (165 bytes, text/plain)
2016-04-15 13:19 UTC, Richard Jones
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Jones 2016-04-15 13:18:49 UTC
Created attachment 98408 [details]
forktest.c

It's probably best to just look at the attached C program and attached C script (you have to use both).  The script is a minimal reproducer from a much larger program (libguestfs).

$ gcc -Wall -O2 forktest.c -o forktest
$ chmod +x forktest.sh
$ ./forktest.sh
mask 2 buf abc
mask 2 buf def
mask 2 buf ghi
mask 2 buf abc
mask 2 buf def
mask 2 buf ghi
mask 2 buf abc
mask 2 buf def
mask 2 buf ghi
[etc forever]

In other words, stdin of the program is duplicated somehow.  It's presumably something to do with the fork() operation.

valgrind-3.11.0-1.fc23.x86_64
glibc-2.22.90-36.fc24.x86_64
kernel 4.4.4-301.fc23.x86_64
Comment 1 Richard Jones 2016-04-15 13:19:25 UTC
Created attachment 98409 [details]
forktest.sh
Comment 2 Mark Wielaard 2016-04-15 14:26:07 UTC
I can replicate this, but only with redirected input, like in the shell script or when putting the input in a file and running with valgrind ./forktest < input_file. When stdin isn't redirected from some (here) file, but comes direct or through a pipe (cat input | valgrind ./forktest) it works as intended.
Comment 3 Richard Jones 2016-04-15 14:36:54 UTC
For reference here is the actual test that fails for us:

https://github.com/libguestfs/libguestfs/blob/master/fish/test-find0.sh#L25-L30
Comment 4 Tom Hughes 2016-04-15 14:46:56 UTC
Frankly it makes no sense at all. Here's an strace of that valgrind, from the first read on fd 0 to the second:

read(0, "abc\ndef\nghi\n", 4096)        = 12
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0
gettid()                                = 3114
read(1028, "J", 1)                      = 1
pipe2([4, 5], O_CLOEXEC)                = 0
rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x60262190) = 3115
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0
gettid()                                = 3114
gettid()                                = 3114
write(1029, "K", 1)                     = 1
sched_yield()                           = 0
gettid()                                = 3114
read(1028, "K", 1)                      = 1
close(5)                                = 0
gettid()                                = 3114
write(1029, "L", 1)                     = 1
rt_sigprocmask(SIG_SETMASK, [], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
read(4, "\2\0\0\0", 4)                  = 4
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0
gettid()                                = 3114
read(1028, "L", 1)                      = 1
close(4)                                = 0
gettid()                                = 3114
write(1029, "M", 1)                     = 1
rt_sigprocmask(SIG_SETMASK, [], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
wait4(3115, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3115
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3115, si_uid=2067, si_status=0, si_utime=1, si_stime=0} ---
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0
gettid()                                = 3114
read(1028, "M", 1)                      = 1
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
mmap(0x60224000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x60224000
gettid()                                = 3114
write(1029, "N", 1)                     = 1
rt_sigprocmask(SIG_SETMASK, [], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
write(1, "mask 2 buf abc\n", 15)        = 15
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0
gettid()                                = 3114
read(1028, "N", 1)                      = 1
pipe2([4, 5], O_CLOEXEC)                = 0
rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x60262190) = 3116
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0
gettid()                                = 3114
gettid()                                = 3114
write(1029, "O", 1)                     = 1
sched_yield()                           = 0
gettid()                                = 3114
read(1028, "O", 1)                      = 1
close(5)                                = 0
gettid()                                = 3114
write(1029, "P", 1)                     = 1
rt_sigprocmask(SIG_SETMASK, [], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
read(4, "\2\0\0\0", 4)                  = 4
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0
gettid()                                = 3114
read(1028, "P", 1)                      = 1
close(4)                                = 0
gettid()                                = 3114
write(1029, "Q", 1)                     = 1
rt_sigprocmask(SIG_SETMASK, [], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
wait4(3116, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3116
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3116, si_uid=2067, si_status=0, si_utime=1, si_stime=0} ---
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0
gettid()                                = 3114
read(1028, "Q", 1)                      = 1
gettid()                                = 3114
write(1029, "R", 1)                     = 1
rt_sigprocmask(SIG_SETMASK, [], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
write(1, "mask 2 buf def\n", 15)        = 15
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0
gettid()                                = 3114
read(1028, "R", 1)                      = 1
pipe2([4, 5], O_CLOEXEC)                = 0
rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x60262190) = 3117
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0
gettid()                                = 3114
gettid()                                = 3114
write(1029, "S", 1)                     = 1
sched_yield()                           = 0
gettid()                                = 3114
read(1028, "S", 1)                      = 1
close(5)                                = 0
gettid()                                = 3114
write(1029, "T", 1)                     = 1
rt_sigprocmask(SIG_SETMASK, [], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
read(4, "\2\0\0\0", 4)                  = 4
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0
gettid()                                = 3114
read(1028, "T", 1)                      = 1
close(4)                                = 0
gettid()                                = 3114
write(1029, "U", 1)                     = 1
rt_sigprocmask(SIG_SETMASK, [], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
wait4(3117, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3117
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3117, si_uid=2067, si_status=0, si_utime=1, si_stime=0} ---
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0
gettid()                                = 3114
read(1028, "U", 1)                      = 1
gettid()                                = 3114
write(1029, "V", 1)                     = 1
rt_sigprocmask(SIG_SETMASK, [], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
write(1, "mask 2 buf ghi\n", 15)        = 15
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0
gettid()                                = 3114
read(1028, "V", 1)                      = 1
gettid()                                = 3114
write(1029, "W", 1)                     = 1
rt_sigprocmask(SIG_SETMASK, [], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0
read(0, "abc\ndef\nghi\n", 4096)        = 12

I see no system call there which should have caused the kernel to rewind that descriptor...
Comment 5 Tom Hughes 2016-04-15 14:49:30 UTC
Ah, if you add the childs you will see each one do:

3992  lseek(0, -8, SEEK_CUR)            = 4

for some reason...
Comment 6 Tom Hughes 2016-04-15 14:53:40 UTC
It's something in glibc doing it when __libc_freeres is run as the child exits.

If you add ----run-libc-freeres=no to the valgrind options then it doesn't happen.
Comment 7 Tom Hughes 2016-04-15 15:06:20 UTC
I think the problem is that __libc_freeres calls (among many other things) the _IO_unbuffer_write to free the buffers for stdio streams.

That "unbuffers" each file by calling _IO_SETBUF (fp, NULL, 0) to drop the buffer, but that (which is just the internal implementation of setbuf/setvbuf) rewinds the stream over any data which has been read into the buffer but not yet consumed.

Of course when the buffer is shared between multiple processes the question of what has been consumed is something of an open one...
Comment 8 Tom Hughes 2016-04-15 15:07:24 UTC
Of course the buffer isn't really shared (this is fork, not threads...) but the file position is.
Comment 9 Richard Jones 2016-04-15 15:38:51 UTC
(In reply to Tom Hughes from comment #6)
> It's something in glibc doing it when __libc_freeres is run as the child
> exits.
> 
> If you add ----run-libc-freeres=no to the valgrind options then it doesn't
> happen.

The child calls _exit on every return path, so surely no libc functions should be called?
Comment 10 Tom Hughes 2016-04-15 15:46:29 UTC
The call to __libc_freeres is not normally made at all - it is forced by valgrind (unless you use that switch) so that people won't moan about "still reachable" blocks of memory in the leak report due to glibc still having stuff allocated.
Comment 11 Tom Hughes 2016-04-15 15:47:26 UTC
See http://valgrind.org/docs/manual/faq.html#faq.exit_errors for more information.
Comment 12 Mark Wielaard 2016-04-15 15:47:58 UTC
Maybe --child-silent-after-fork=yes should imply ----run-libc-freeres=no for the fork? As I understand correctly the output sink is set to -1 for the fork already, so there cannot be any output. So running __libc_freeres is not very useful anyway (and we wouldn't do that if the forked child would call exec in the first place).
Comment 13 Tom Hughes 2016-04-15 16:10:33 UTC
I'm pretty sure that it's a bug in glibc that should be fixed anyway - you're only supposed to call __libc_freeres when the process is exiting and in that case there shouldn't be any need to rewind any descriptor if you discard the buffer.

Of course there's an interesting issue here in that I suspect forking and calling setvbuf() on a channel in the fork will have much the same effect on the parent that you are sharing the descriptor with...
Comment 14 Ivo Raisr 2017-05-05 14:41:06 UTC
Tom, could you please describe that bug in glibc in more detail?
We can go to glibc folks and eventually have this fixed.