Bug 417906 - clone with CLONE_VFORK and no CLONE_VM fails
Summary: clone with CLONE_VFORK and no CLONE_VM fails
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: general (show other bugs)
Version: 3.15 SVN
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-02-20 07:31 UTC by Thiago Macieira
Modified: 2020-02-20 16:22 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thiago Macieira 2020-02-20 07:31:17 UTC
SUMMARY
Linux 5.2 added CLONE_PIDFD flag to the clone() system call, which returns a file descriptor representing the child process. Qt 5.15 will use this flag to have a better representation of child processes, without the need to use a SIGCHLD handler. This is also, finally, thread-safe.

Running Valgrind on an application using Qt 5.15's QProcess class produces:

==180074== Unsupported clone() flags: 0x5000
==180074== 
==180074== The only supported clone() uses are:
==180074==  - via a threads library (LinuxThreads or NPTL)
==180074==  - via the implementation of fork or vfork
==180074== 
==180074== Valgrind detected that your program requires
==180074== the following unimplemented functionality:
==180074==    Valgrind does not support general clone().
==180074== This may be because the functionality is hard to implement,
==180074== or because no reasonable program would behave this way,
==180074== or because nobody has yet needed it.  In any case, let us know at
==180074== www.valgrind.org and/or try to work around the problem, if you can.
==180074== 
==180074== Valgrind has to exit now.  Sorry.  Bye!

CLONE_PIDFD is 0x1000. CLONE_VFORK is 0x4000.

Without CLONE_VFORK, execution seems to continue but produce nonsensical results.

OBSERVED RESULT
Valgrind either terminates or produce nonsensical reports when CLONE_PIDFD is used.

EXPECTED RESULT
CLONE_PIDFD is understood properly

SOFTWARE/OS VERSIONS
Linux 5.5.2

ADDITIONAL INFORMATION
Please also implement the P_PIDFD argument to waitid(), available since Linux 5.4. Linux 5.3 added clone3 system call, but we're not using it yet.
Comment 1 Thiago Macieira 2020-02-20 07:32:25 UTC
First reported at https://bugreports.qt.io/browse/QTBUG-82351
Comment 2 Tom Hughes 2020-02-20 08:35:10 UTC
Do you have a small test case for this?
Comment 3 Tom Hughes 2020-02-20 09:00:08 UTC
This seems to do the job:

#define _GNU_SOURCE

#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>

int main(int argc, char **argv)
{
  int pid;
  int fd;

  if ((pid = syscall(SYS_clone, CLONE_VFORK | CLONE_PIDFD, NULL, &fd, NULL, 0)) < 0)
    {
      perror("clone");
    }
  else if (pid > 0)
    {
      printf("pidfd = %d\n", fd);

      waitpid(pid, NULL, 0);
    }

  exit(0);
}
Comment 4 Tom Hughes 2020-02-20 09:09:00 UTC
The problem is not CLONE_PIDFD at all, it's the use of CLONE_VFORK without CLONE_VM. It fails in the same way if you remove CLONE_PIDFD:

==171682== Memcheck, a memory error detector
==171682== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==171682== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==171682== Command: ./clone
==171682== 
==171682== Unsupported clone() flags: 0x4000
==171682== 
==171682== The only supported clone() uses are:
==171682==  - via a threads library (LinuxThreads or NPTL)
==171682==  - via the implementation of fork or vfork
==171682== 
==171682== Valgrind detected that your program requires
==171682== the following unimplemented functionality:
==171682==    Valgrind does not support general clone().
==171682== This may be because the functionality is hard to implement,
==171682== or because no reasonable program would behave this way,
==171682== or because nobody has yet needed it.  In any case, let us know at
==171682== www.valgrind.org and/or try to work around the problem, if you can.
==171682== 
==171682== Valgrind has to exit now.  Sorry.  Bye!
==171682== 

We only support three basic modes of operation, defined by the values of the following flags:

* CLONE_VM
* CLONE_FS
* CLONE_FILES
* CLONE_VFORK

Those modes are:

* thread creation (CLONE_VM, CLONE_FS and CLONE_FILES set)
* traditional vfork (CLONE_VFORK and CLONE_VM set)
* traditional fork (none of the above set)

We generally allow any combination of other flags however, it's just those four that are required to choose which general mode of operation we are dealing with.

Setting CLONE_VFORK on it's own has the effect of causing the parent to be suspended until the child exits or execs but without the shared address space that vfork would normally have. That makes it closer to fork than vfork really and in any case valgrind transforms vfork to fork by removing the CLONE_VM flag so I see no reason not to allow this combination.
Comment 5 Tom Hughes 2020-02-20 09:26:21 UTC
Fixed in 3e11902ce28ff43cff679bf2453c597d568fde8f with a followup in 22aa8640e6c44c78c228ffa726cfacf918455343 to mark the returned descriptor as defined with CLONE_PIDFD is used.
Comment 6 Thiago Macieira 2020-02-20 16:22:43 UTC
Want me to open a separate report for P_PIDFD? Crude adapting your testcase:

#define _GNU_SOURCE

#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>

int main(int argc, char **argv)
{
  int pid;
  int fd;

  if ((pid = syscall(SYS_clone, CLONE_VFORK | CLONE_PIDFD, NULL, &fd, NULL, 0)) < 0)
    {
      perror("clone");
    }
  else if (pid > 0)
    {
      struct siginfo si;
      struct rusage usage;

      printf("pidfd = %d\n", fd);
      syscall(SYS_waitid, fd, P_PIDFD, &si, WEXITED | __WALL, &usage);
    }

  exit(0);
}