Bug 384732 - posix_spawn with glibc 2.25 causes an assertion
Summary: posix_spawn with glibc 2.25 causes an assertion
Status: REPORTED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.13.0
Platform: Other Linux
: NOR crash
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-15 06:54 UTC by Daniel
Modified: 2017-12-11 00:19 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
c code to reproduce the issue. (614 bytes, text/plain)
2017-09-15 06:54 UTC, Daniel
Details
Try to add a replacement function of posix_spawnp to vg_preloaded.c (8.93 KB, patch)
2017-11-27 07:06 UTC, Daniel
Details
Patch with own replacement function for the provided test code (9.26 KB, patch)
2017-11-27 07:07 UTC, Daniel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel 2017-09-15 06:54:32 UTC
Created attachment 107867 [details]
c code to reproduce the issue.

I'm encounting an ASSERT inside the glibc 2.25, if I run the test app where posix_spawnp (or posix_spawn, doesn't matter), is called:

../sysdeps/unix/sysv/linux/spawni.c:360: __spawnix: Assertion `ec >= 0' failed.

Using glibc 2.24 works fine. valgrind 3.12/3.13 doesn't matter, although I didn't test an SVN build, just official builds.

I get this assertion on i386 and arm builds (buildroot environment).

I call the binary like this, without any more options:
valgrind ./test

Without valgrind, the code is executed without problems.

Greetings,
Daniel
Comment 1 Daniel 2017-09-15 07:58:07 UTC
Edit: we're compiling with GCC 7.1.0, no idea if this makes any difference.
Comment 2 Tom Hughes 2017-09-15 08:03:02 UTC
This is because glibc is using clone with CLONE_VFORK and CLONE_VM and is relying on the fact the CLONE_VM means the two processes will be sharing the same address space until the vfork completes by execing the child or exiting.

But valgrind drops the CLONE_VM flag which means they're not sharing the same address space and although the parent does wait for the child to exec (because of CLONE_VFORK) it never sees the error status the child has written because the child wrote to a separate version of the memory.

Basically the parent sets arg.err to -1 then does the clone and expects arg.err to be >= 0 afterwards because the child will have written to it but the fact that valgrind can't support VM cloning means that doesn't happen.
Comment 3 Mark Wielaard 2017-09-15 08:15:24 UTC
This glibc commit seems to have broken how valgrind handles posix_spawn.
It now relies on CLONE_VM semantics.

commit 4b4d4056bb154603f36c6f8845757c1012758158
Author: Rasmus Villemoes <rv@rasmusvillemoes.dk>
Date:   Tue Sep 27 17:52:17 2016 -0700

    linux: spawni.c: simplify error reporting to parent
    
    Using CLONE_VFORK already ensures that the parent does not run until the
    child has either exec'ed succesfully or called _exit. Hence we don't
    need to read from a CLOEXEC pipe to ensure proper synchronization - we
    just make explicit use of the fact the the child and parent run in the
    same VM, so the child can write an error code to a field of the
    posix_spawn_args struct instead of sending it through a pipe.
    
    To ensure that this mechanism really works, the parent initializes the
    field to -1 and the child writes 0 before execing.
    
    This eliminates some annoying bookkeeping that is necessary to avoid
    the file actions from clobbering the write end of the pipe, and
    getting rid of the pipe creation in the first place means fewer system
    calls (four in the parent, usually one in the child) and fewer
    chanches for the spawn to fail (e.g. if we're close to EMFILE).
    
    Checked on x86_64 and i686.
    
        * sysdeps/unix/sysv/linux/spawni.c (posix_spawn_args): Remove pipe
        field, add err field.
        (__spawni_child): Report error through err member instead of pipe.
        (__spawnix): Likewise.
Comment 4 Mark Wielaard 2017-09-15 08:24:11 UTC
I wonder if we can also rely on the CLONE_VFORK mechanism.

So if we see a clone with CLONE_VFORK | CLONE_VM we don't drop the CLONE_VM, but do freeze all other threads till we see the exec/exit syscall.

Since till the exit/exec the host and child memory are the same and only that child thread is running it should keep the valgrind datastructures sane.
Comment 5 Tom Hughes 2017-09-15 09:04:25 UTC
We already detect CLONE_VFORK|CLONE_VM as a special case and deliberately drop CLONE_VM because I imagine anything else would be a disaster as valgrind's shadow memory etc would be shared.

The comments in the source explicitly mention preserving CLONE_VFORK to preserve the wait-for-exec semantics and that does indeed seem to work.
Comment 6 Mark Wielaard 2017-09-15 09:49:04 UTC
(In reply to Tom Hughes from comment #5)
> We already detect CLONE_VFORK|CLONE_VM as a special case and deliberately
> drop CLONE_VM because I imagine anything else would be a disaster as
> valgrind's shadow memory etc would be shared.

Sharing shadow memory should be fine since we do share the actual memory.
But sharing the thread state is probably a problem.

And indeed just not dropping VM_CLONE when we see VM_VFORK doesn't work.
It does successfully create the child and execs the new process, but returning to the parent seg faults fairly soon.

> The comments in the source explicitly mention preserving CLONE_VFORK to
> preserve the wait-for-exec semantics and that does indeed seem to work.

Earlier we did drop both CLONE_VFORK and CLONE_VM.
And I can definitely see why CLONE_VM is bad without CLONE_VFORK.

I am probable missing some bad sharing which messes things up.
But I still wonder if we could handle CLONE_VFORK | CLONE_VM specially as if it was a simple thread creation (do_clone vs do_fork_clone) up till the actual exec/exit call (at which point we might have to do some magic to do the actual fork). Maybe too much bad voodoo though...
Comment 7 Philippe Waroquiers 2017-09-17 16:34:11 UTC
(In reply to Mark Wielaard from comment #6)
> (In reply to Tom Hughes from comment #5)
> > We already detect CLONE_VFORK|CLONE_VM as a special case and deliberately
> > drop CLONE_VM because I imagine anything else would be a disaster as
> > valgrind's shadow memory etc would be shared.
> 
> Sharing shadow memory should be fine since we do share the actual memory.
> But sharing the thread state is probably a problem.
> 
> And indeed just not dropping VM_CLONE when we see VM_VFORK doesn't work.
> It does successfully create the child and execs the new process, but
> returning to the parent seg faults fairly soon.
Yes, that looks expected. For example, if the scheduler in the child process
has to regain control e.g. to translate a block in what follows the clone syscall, then the parent host stack will be thrashed by the code executed by
the host child.
In other words, i highly suspect that the constraints of vfork syscall
apply for VM_CLONE+VM_VFORK (i.e. no function call, no return, ...).
What is tricky is that such constraints have to be respected both for
the guest and the host parts of the child;

> 
> > The comments in the source explicitly mention preserving CLONE_VFORK to
> > preserve the wait-for-exec semantics and that does indeed seem to work.
> 
> Earlier we did drop both CLONE_VFORK and CLONE_VM.
> And I can definitely see why CLONE_VM is bad without CLONE_VFORK.
> 
> I am probable missing some bad sharing which messes things up.
> But I still wonder if we could handle CLONE_VFORK | CLONE_VM specially as if
> it was a simple thread creation (do_clone vs do_fork_clone) up till the
> actual exec/exit call (at which point we might have to do some magic to do
> the actual fork). Maybe too much bad voodoo though...
To fully respect the vfork semantic, all the threads have to be suspended
while the new thread has not yet done the exec, and nothing can happen
in these threds (e.g. signal handling or whatever).
It does look like a nice challenge to achieve.
Comment 8 Ivo Raisr 2017-09-19 22:40:21 UTC
Solaris port had to support "proper" vfork semantics right from the beginning because Solaris libc relies on its proper functioning.

So there are some clever workarounds how to "preserve" vfork semantics but translate it to fork() at the same time.
Have a look at coregrind/m_syswrap/syswrap-solaris.c and coregrind/vg_preloaded.c, search there for "vfork".
Apart from other stuff which goes on there, a pipe is created to communicate error code from child to parent. This in fact emulates shared address space.

If I understand it correctly, pipe was used previously for the very same reason in glibc <= 2.24. Perhaps this mechanism can be leveraged in Valgrind for glibc >= 2.25?
Comment 9 Daniel 2017-11-21 14:59:58 UTC
Do you have any idea if this will/can be fixed?
Or do I have any possibility to work around this issue in my code?

Greetings,
Daniel
Comment 10 Philippe Waroquiers 2017-11-21 21:13:37 UTC
(In reply to Daniel from comment #9)
> Do you have any idea if this will/can be fixed?
Nobody seems to work on this for the moment, so time to fix is unknown.
It looks like the solaris specific code should/could be made more
general, so as to work on linux also. As I understand, the solaris code
is intercepting a subset of the (solaris) libc functions in the child,
so as to replace the direct assignment of the child status code in the parent
address space by a procedure that gives back the child code to the parent
via a pipe. I am wondering if it would not be simpler to just replace fully
the posix_spawn in vg_preload, as the solaris solution seems quite dependent
on internal implementation of libc (e.g. the replacement of get_error and set_error).


> Or do I have any possibility to work around this issue in my code?
What you might do is to write a replacement procedure for
posix_spawn/posis_spawnp, using the technique/code of glibc 2.24.
See http://www.valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.wrapping

(of course, do not call the original posix_spawn, just implement it fully
in the wrapping functionà.

Basically, this suggestion consists in implementing the alternative solution
discussed above, but outside of valgrind.
So, you might maybe envisage to just write it inside vg_prelaoded.c 
and provide a patch, if ever what I suggest makes sense :)
Comment 11 Daniel 2017-11-24 10:46:35 UTC
(In reply to Philippe Waroquiers from comment #10)
> What you might do is to write a replacement procedure for
> posix_spawn/posis_spawnp, using the technique/code of glibc 2.24.

Good suggestion! I'm not familiar with the valgrind code, but I tried adding a replacement function to vg_preloaded.c
It compiled but as I ran my binary, the replacement function was not used.

I then added the glibc 2.24 code to my own program and replaced my posix_spawnp call with my own glibc implementation and this works.
I just run this implementation if valgrind is running, otherwise the standard posix_spawnp function.

So I can live with the issue now, but of course a real fix in valgrind would be perferable.

Thanks for all your work on this great tool!
Comment 12 Philippe Waroquiers 2017-11-25 10:01:49 UTC
(In reply to Daniel from comment #11)
> (In reply to Philippe Waroquiers from comment #10)
> > What you might do is to write a replacement procedure for
> > posix_spawn/posis_spawnp, using the technique/code of glibc 2.24.
> 
> Good suggestion! I'm not familiar with the valgrind code, but I tried adding
> a replacement function to vg_preloaded.c
> It compiled but as I ran my binary, the replacement function was not used.
> 
> I then added the glibc 2.24 code to my own program and replaced my
> posix_spawnp call with my own glibc implementation and this works.
> I just run this implementation if valgrind is running, otherwise the
> standard posix_spawnp function.
> 
> So I can live with the issue now, but of course a real fix in valgrind would
> be perferable.
> 
> Thanks for all your work on this great tool!

Thanks for the feedback.

Can you attach to this bug the 2 different patches (the 'in application' patch,
and the 'in valgrind' patch) ?

Others might use the first as a temporary bypass, and we could see why the
second one is not working, fix and integrate it;

Thanks
Comment 13 Daniel 2017-11-27 07:06:27 UTC
Created attachment 109074 [details]
Try to add a replacement function of posix_spawnp to vg_preloaded.c

The patch is a diff of the current vg_preloaded.c git code.
It was a quick&dirty try, not very nice.

It compiled but didn't work, although I didn't try to find out why the replacement function wasn't called.
Comment 14 Daniel 2017-11-27 07:07:36 UTC
Created attachment 109075 [details]
Patch with own replacement function for the provided test code

This patch is a diff of the provided test code and a new test code which includes an own posix_spawnp replacement function of glibc 2.24