Created attachment 98408 [details] forktest.c It's probably best to just look at the attached C program and attached C script (you have to use both). The script is a minimal reproducer from a much larger program (libguestfs). $ gcc -Wall -O2 forktest.c -o forktest $ chmod +x forktest.sh $ ./forktest.sh mask 2 buf abc mask 2 buf def mask 2 buf ghi mask 2 buf abc mask 2 buf def mask 2 buf ghi mask 2 buf abc mask 2 buf def mask 2 buf ghi [etc forever] In other words, stdin of the program is duplicated somehow. It's presumably something to do with the fork() operation. valgrind-3.11.0-1.fc23.x86_64 glibc-2.22.90-36.fc24.x86_64 kernel 4.4.4-301.fc23.x86_64
Created attachment 98409 [details] forktest.sh
I can replicate this, but only with redirected input, like in the shell script or when putting the input in a file and running with valgrind ./forktest < input_file. When stdin isn't redirected from some (here) file, but comes direct or through a pipe (cat input | valgrind ./forktest) it works as intended.
For reference here is the actual test that fails for us: https://github.com/libguestfs/libguestfs/blob/master/fish/test-find0.sh#L25-L30
Frankly it makes no sense at all. Here's an strace of that valgrind, from the first read on fd 0 to the second: read(0, "abc\ndef\nghi\n", 4096) = 12 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0 gettid() = 3114 read(1028, "J", 1) = 1 pipe2([4, 5], O_CLOEXEC) = 0 rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x60262190) = 3115 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0 gettid() = 3114 gettid() = 3114 write(1029, "K", 1) = 1 sched_yield() = 0 gettid() = 3114 read(1028, "K", 1) = 1 close(5) = 0 gettid() = 3114 write(1029, "L", 1) = 1 rt_sigprocmask(SIG_SETMASK, [], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0 read(4, "\2\0\0\0", 4) = 4 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0 gettid() = 3114 read(1028, "L", 1) = 1 close(4) = 0 gettid() = 3114 write(1029, "M", 1) = 1 rt_sigprocmask(SIG_SETMASK, [], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0 wait4(3115, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3115 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3115, si_uid=2067, si_status=0, si_utime=1, si_stime=0} --- rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0 gettid() = 3114 read(1028, "M", 1) = 1 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0 mmap(0x60224000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x60224000 gettid() = 3114 write(1029, "N", 1) = 1 rt_sigprocmask(SIG_SETMASK, [], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0 write(1, "mask 2 buf abc\n", 15) = 15 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0 gettid() = 3114 read(1028, "N", 1) = 1 pipe2([4, 5], O_CLOEXEC) = 0 rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x60262190) = 3116 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0 gettid() = 3114 gettid() = 3114 write(1029, "O", 1) = 1 sched_yield() = 0 gettid() = 3114 read(1028, "O", 1) = 1 close(5) = 0 gettid() = 3114 write(1029, "P", 1) = 1 rt_sigprocmask(SIG_SETMASK, [], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0 read(4, "\2\0\0\0", 4) = 4 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0 gettid() = 3114 read(1028, "P", 1) = 1 close(4) = 0 gettid() = 3114 write(1029, "Q", 1) = 1 rt_sigprocmask(SIG_SETMASK, [], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0 wait4(3116, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3116 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3116, si_uid=2067, si_status=0, si_utime=1, si_stime=0} --- rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0 gettid() = 3114 read(1028, "Q", 1) = 1 gettid() = 3114 write(1029, "R", 1) = 1 rt_sigprocmask(SIG_SETMASK, [], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0 write(1, "mask 2 buf def\n", 15) = 15 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0 gettid() = 3114 read(1028, "R", 1) = 1 pipe2([4, 5], O_CLOEXEC) = 0 rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x60262190) = 3117 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0 gettid() = 3114 gettid() = 3114 write(1029, "S", 1) = 1 sched_yield() = 0 gettid() = 3114 read(1028, "S", 1) = 1 close(5) = 0 gettid() = 3114 write(1029, "T", 1) = 1 rt_sigprocmask(SIG_SETMASK, [], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0 read(4, "\2\0\0\0", 4) = 4 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0 gettid() = 3114 read(1028, "T", 1) = 1 close(4) = 0 gettid() = 3114 write(1029, "U", 1) = 1 rt_sigprocmask(SIG_SETMASK, [], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0 wait4(3117, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3117 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3117, si_uid=2067, si_status=0, si_utime=1, si_stime=0} --- rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0 gettid() = 3114 read(1028, "U", 1) = 1 gettid() = 3114 write(1029, "V", 1) = 1 rt_sigprocmask(SIG_SETMASK, [], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0 write(1, "mask 2 buf ghi\n", 15) = 15 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP], NULL, 8) = 0 gettid() = 3114 read(1028, "V", 1) = 1 gettid() = 3114 write(1029, "W", 1) = 1 rt_sigprocmask(SIG_SETMASK, [], ~[ILL TRAP BUS FPE KILL SEGV STOP], 8) = 0 read(0, "abc\ndef\nghi\n", 4096) = 12 I see no system call there which should have caused the kernel to rewind that descriptor...
Ah, if you add the childs you will see each one do: 3992 lseek(0, -8, SEEK_CUR) = 4 for some reason...
It's something in glibc doing it when __libc_freeres is run as the child exits. If you add ----run-libc-freeres=no to the valgrind options then it doesn't happen.
I think the problem is that __libc_freeres calls (among many other things) the _IO_unbuffer_write to free the buffers for stdio streams. That "unbuffers" each file by calling _IO_SETBUF (fp, NULL, 0) to drop the buffer, but that (which is just the internal implementation of setbuf/setvbuf) rewinds the stream over any data which has been read into the buffer but not yet consumed. Of course when the buffer is shared between multiple processes the question of what has been consumed is something of an open one...
Of course the buffer isn't really shared (this is fork, not threads...) but the file position is.
(In reply to Tom Hughes from comment #6) > It's something in glibc doing it when __libc_freeres is run as the child > exits. > > If you add ----run-libc-freeres=no to the valgrind options then it doesn't > happen. The child calls _exit on every return path, so surely no libc functions should be called?
The call to __libc_freeres is not normally made at all - it is forced by valgrind (unless you use that switch) so that people won't moan about "still reachable" blocks of memory in the leak report due to glibc still having stuff allocated.
See http://valgrind.org/docs/manual/faq.html#faq.exit_errors for more information.
Maybe --child-silent-after-fork=yes should imply ----run-libc-freeres=no for the fork? As I understand correctly the output sink is set to -1 for the fork already, so there cannot be any output. So running __libc_freeres is not very useful anyway (and we wouldn't do that if the forked child would call exec in the first place).
I'm pretty sure that it's a bug in glibc that should be fixed anyway - you're only supposed to call __libc_freeres when the process is exiting and in that case there shouldn't be any need to rewind any descriptor if you discard the buffer. Of course there's an interesting issue here in that I suspect forking and calling setvbuf() on a channel in the fork will have much the same effect on the parent that you are sharing the descriptor with...
Tom, could you please describe that bug in glibc in more detail? We can go to glibc folks and eventually have this fixed.