Executing the program below with valgrind 3.12 built from sources on Fedora 25 gives the following output: pid 21786 child 21787 r 0 pid 21787 child 83036382 r 0 Without valgrind: pid 21805 child 21806 r 0 <output from ls> Valgrind completely breaks the posix_spawn call by causing some kind of fork and also not executing the ls. #include <unistd.h> #include <spawn.h> #include <stdio.h> int main() { pid_t pid; char cmd[] = "ls"; char* arg[] = { cmd, NULL }; int r = posix_spawnp(&pid, cmd, NULL, NULL, arg, NULL); fprintf(stderr, "pid %d child %d r %d\n", getpid(), pid, r); }
The same example worked on fedora 24 (glibc 2.23, linux 4.7.9), fedora 25 uses glibc 2.24 and linux 4.8.8.
Looking at the differences between 2.23 and 2.24, I suspect the problem is created because 2.24 is implementing posix_spawn(p) using clone (CLONE_VM| CLONE_VFORK), while 2.23 uses fork or vfork. Valgrind does not implement vfork, it just considers vfork is a fork, hoping for the best. It looks like the new glibc posix_spawn implementation depends more on the precise semantic of vfork (or more exactly, of clone(CLONE_VM| CLONE_VFORK), such as sharing address space and/or having the calling thread suspended waiting for the child to exec or exit. If the dependency is only on the 'being suspended" aspect, that might be relatively easy to fix. Hwoever, from the trace, it looks like both the parent and the child have executed the fprintf. Not very clear how that can happen. Can you run with --trace-syscalls=yes --trace-children=yes --time-stamp=yes and attach the resulting trace ? Thanks
Created attachment 102643 [details] Trace with fc25
Created attachment 102644 [details] Trace with fc24
Also, as has been noted in bug 342040, it seems the clone call is not handled correctly when looking with strace: strace valgrind ~/spawn|& grep clone => clone(child_stack=NULL, flags=SIGCHLD) strace ~/spawn|& grep clone => clone(child_stack=0x7f13dd5d6ff0, flags=CLONE_VM|CLONE_VFORK|SIGCHLD)
(In reply to megahallon from comment #5) > Also, as has been noted in bug 342040, it seems the clone call is not > handled correctly when looking with strace: > > strace valgrind ~/spawn|& grep clone > => > clone(child_stack=NULL, flags=SIGCHLD) > > strace ~/spawn|& grep clone > => > clone(child_stack=0x7f13dd5d6ff0, flags=CLONE_VM|CLONE_VFORK|SIGCHLD) This is normal. Valgrind transforms a vfork into a fork, as it does not support vfork semantic. See the following code in e.g. syswrap-amd64-linux.c: case VKI_CLONE_VFORK | VKI_CLONE_VM: /* vfork */ /* FALLTHROUGH - assume vfork == fork */ cloneflags &= ~(VKI_CLONE_VFORK | VKI_CLONE_VM); case 0: /* plain fork */ ...
(In reply to megahallon from comment #0) > Executing the program below with valgrind 3.12 built from sources on Fedora > 25 gives the following output: > > pid 21786 child 21787 r 0 > pid 21787 child 83036382 r 0 As far as I can see, the above happens because Valgrind transforms a clone (SIGCHLD | CLONE_VFORK | CLONE_VM) into a simple clone(SIGCHLD) but wrongly assumes that both parent and child will check for clone return code, and executes 'their' part of the code. I have not (yet) understood where/how Valgrind gives the guest thread function pointer for a 'thread clone'. If/when I understand that, it might be possible to ensure that the cloned child 'jumps' to the correct function for a vfork clone transformed into a 'normal fork'.
Created attachment 102691 [details] tentative patch (ugly, only for amd64) patch to better support clone(vfork) and stack Can you please try the attached patch ? (patch on latest SVN version) It seems to improve the clone test case provided in bug 342040, and I suspect fc25 posix_spawn might fail for similar reason, i.e. that the parent has prepared a stack for the child, but the child was just using the copy of the parent stack.
(In reply to Philippe Waroquiers from comment #8) > Created attachment 102691 [details] > tentative patch (ugly, only for amd64) patch to better support clone(vfork) > and stack > > Can you please try the attached patch ? (patch on latest SVN version) > > It seems to improve the clone test case provided in bug 342040, > and I suspect fc25 posix_spawn might fail for similar reason, i.e. > that the parent has prepared a stack for the child, but the child was just > using the copy of the parent stack. Yes as far as I can see it works fine with this patch.
Fixed in revision 16186. Retesting welcome.
(In reply to Philippe Waroquiers from comment #10) > Fixed in revision 16186. > Retesting welcome. I have tested with the svn version and it is still fine, thanks for the fix.