Bug 423963 - Error in child thread when CLONE_PIDFD is used
Summary: Error in child thread when CLONE_PIDFD is used
Status: RESOLVED FIXED
Alias: None
Product: valgrind
Classification: Developer tools
Component: memcheck (show other bugs)
Version: 3.15 SVN
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Julian Seward
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-07-07 10:05 UTC by aklitzing
Modified: 2021-04-06 23:54 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Minimal Qt 6 example (30.00 KB, application/x-tar)
2021-04-06 13:35 UTC, Christoph Cullmann
Details
valgrind 3.17.0 run with Qt 6.0 (44.62 KB, text/plain)
2021-04-06 14:06 UTC, Christoph Cullmann
Details
valgrind 3.17.0 run with Qt 6.1 (31.30 KB, text/plain)
2021-04-06 14:06 UTC, Christoph Cullmann
Details
strace for "successful" 6.0 run (130.14 KB, application/gzip)
2021-04-06 16:33 UTC, Christoph Cullmann
Details
strace for "failing" 6.1 run (121.68 KB, application/gzip)
2021-04-06 16:33 UTC, Christoph Cullmann
Details

Note You need to log in before you can comment on or make changes to this bug.
Description aklitzing 2020-07-07 10:05:16 UTC
SUMMARY
An integration test should spawn a subprocess with QProcess of Qt. But valgrind hangs and cannot spawn that child process.

STEPS TO REPRODUCE
1. Create a QTest that spawns with QProcess a child
2. Valgrind cannot spawn it and throws this error
3. 

OBSERVED RESULT


EXPECTED RESULT


SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Linux 5.7.7
(available in About System)
KDE Plasma Version: 5.19.2
KDE Frameworks Version: 5.71.0
Qt Version: 5.15.0

ADDITIONAL INFORMATION
valgrind -v --tool=memcheck --leak-check=full --show-leak-kinds=definite --errors-for-leak-kinds=definite --error-exitcode=1 --gen-suppressions=all --suppressions=/home/andre/hg/AusweisApp2/libs/test/valgrind.supp --trace-children=yes test/qt/Test_ui_qml_UIPlugInQml





********* Start testing of test_UIPlugInQml *********
Config: Using QtTest library 5.15.0, Qt 5.15.0 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 10.1.0)
--43990-- REDIR: 0x6bbf760 (libstdc++.so.6:operator delete[](void*)) redirected to 0x483b520 (operator delete[](void*))
--43990-- REDIR: 0x6f045f0 (libc.so.6:__strstr_sse2_unaligned) redirected to 0x4841660 (strstr)
PASS   : test_UIPlugInQml::initTestCase()
==43997== embedded gdbserver: reading from /tmp/vgdb-pipe-from-vgdb-to-43997-by-andre-on-???
==43997== embedded gdbserver: writing to   /tmp/vgdb-pipe-to-vgdb-from-43997-by-andre-on-???
==43997== embedded gdbserver: shared mem   /tmp/vgdb-pipe-shared-mem-vgdb-43997-by-andre-on-???
==43997== 
==43997== TO CONTROL THIS PROCESS USING vgdb (which you probably
==43997== don't want to do, unless you know exactly what you're doing,
==43997== or are doing some strange experiment):
==43997==   /usr/lib/valgrind/../../bin/vgdb --pid=43997 ...command...
==43997== 
==43997== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==43997==   /path/to/gdb test/qt/Test_ui_qml_UIPlugInQml
==43997== and then give GDB the following command
==43997==   target remote | /usr/lib/valgrind/../../bin/vgdb --pid=43997
==43997== --pid is optional if only one valgrind process is running
==43997== 
==43997== Warning: invalid file descriptor -16781608 in syscall clone()
==43997==    at 0x6F5671D: syscall (in /usr/lib/libc-2.31.so)
==43997==    by 0x6812C52: ??? (in /usr/lib/libQt5Core.so.5.15.0)
==43997==    by 0x67F6C8B: ??? (in /usr/lib/libQt5Core.so.5.15.0)
==43997==    by 0x154717: test_UIPlugInQml::test_qmlEngineInit() (test_UIPlugInQml.cpp:147)
==43997==    by 0x150F5D: test_UIPlugInQml::qt_static_metacall(QObject*, QMetaObject::Call, int, void**) (test_UIPlugInQml.moc:84)
==43997==    by 0x688DE35: QMetaMethod::invoke(QObject*, Qt::ConnectionType, QGenericReturnArgument, QGenericArgument, QGenericArgument, QGenericArgument, QGenericArgument, QGenericArgument, QGenericArgument, QGenericArgument, QGenericArgument, QGenericArgument, QGenericArgument) const (in /usr/lib/libQt5Core.so.5.15.0)
==43997==    by 0x48A7CF6: ??? (in /usr/lib/libQt5Test.so.5.15.0)
==43997==    by 0x48A85B0: ??? (in /usr/lib/libQt5Test.so.5.15.0)
==43997==    by 0x48A8B63: ??? (in /usr/lib/libQt5Test.so.5.15.0)
==43997==    by 0x48A902D: QTest::qRun() (in /usr/lib/libQt5Test.so.5.15.0)
==43997==    by 0x48A93DD: QTest::qExec(QObject*, int, char**) (in /usr/lib/libQt5Test.so.5.15.0)
==43997==    by 0x150EB4: main (test_UIPlugInQml.cpp:230)

valgrind: m_syswrap/syswrap-main.c:1957 (vgPlain_client_syscall): Assertion '0 == (sci->flags & ~(SfMayBlock | SfPostOnFail | SfPollAfter))' failed.

host stacktrace:
==43997==    at 0x58041B2A: show_sched_status_wrk (m_libcassert.c:406)
==43997==    by 0x58041C47: report_and_quit (m_libcassert.c:477)
==43997==    by 0x58041DD7: vgPlain_assert_fail (m_libcassert.c:543)
==43997==    by 0x5809AB4F: vgPlain_client_syscall (syswrap-main.c:1980)
==43997==    by 0x5809617A: handle_syscall (scheduler.c:1208)
==43997==    by 0x58098177: vgPlain_scheduler (scheduler.c:1526)
==43997==    by 0x580E38D0: thread_wrapper (syswrap-linux.c:101)
==43997==    by 0x580E38D0: run_a_thread_NORETURN (syswrap-linux.c:154)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable syscall 56 (lwpid 43997)
==43997==    at 0x6F5671D: syscall (in /usr/lib/libc-2.31.so)
==43997==    by 0x6812C52: ??? (in /usr/lib/libQt5Core.so.5.15.0)
==43997==    by 0x67F6C8B: ??? (in /usr/lib/libQt5Core.so.5.15.0)
==43997==    by 0x154717: test_UIPlugInQml::test_qmlEngineInit() (test_UIPlugInQml.cpp:147)
==43997==    by 0x150F5D: test_UIPlugInQml::qt_static_metacall(QObject*, QMetaObject::Call, int, void**) (test_UIPlugInQml.moc:84)
==43997==    by 0x688DE35: QMetaMethod::invoke(QObject*, Qt::ConnectionType, QGenericReturnArgument, QGenericArgument, QGenericArgument, QGenericArgument, QGenericArgument, QGenericArgument, QGenericArgument, QGenericArgument, QGenericArgument, QGenericArgument, QGenericArgument) const (in /usr/lib/libQt5Core.so.5.15.0)
==43997==    by 0x48A7CF6: ??? (in /usr/lib/libQt5Test.so.5.15.0)
==43997==    by 0x48A85B0: ??? (in /usr/lib/libQt5Test.so.5.15.0)
==43997==    by 0x48A8B63: ??? (in /usr/lib/libQt5Test.so.5.15.0)
==43997==    by 0x48A902D: QTest::qRun() (in /usr/lib/libQt5Test.so.5.15.0)
==43997==    by 0x48A93DD: QTest::qExec(QObject*, int, char**) (in /usr/lib/libQt5Test.so.5.15.0)
==43997==    by 0x150EB4: main (test_UIPlugInQml.cpp:230)
client stack range: [0x1FFEFFD000 0x1FFF000FFF] client SP: 0x1FFEFFEE28
valgrind stack range: [0x100307E000 0x100317DFFF] top usage: 13424 of 1048576


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.

--43990-- REDIR: 0x6fbfd70 (libc.so.6:__strncpy_avx2) redirected to 0x483cf40 (strncpy)
QDEBUG : test_UIPlugInQml::test_qmlEngineInit(Android) ############ QFileInfo(/tmp/AusweisApp2.43997.port)
--43990-- REDIR: 0x6bbf730 (libstdc++.so.6:operator delete(void*)) redirected to 0x483ae40 (operator delete(void*))
Comment 1 Thiago Macieira 2020-12-01 19:43:21 UTC
See also https://bugs.kde.org/show_bug.cgi?id=427433

Qt 5.15.0 is not acceptable. Please upgrade to 5.15.2, which contains https://codereview.qt-project.org/c/qt/qtbase/+/314049
Comment 2 Christoph Cullmann 2021-04-01 09:18:54 UTC
Hi,

I have the same issue with Qt 6.1 (current branch state of today).
(and I have this since at least 6.0)

Interesting enough, it doesn't happen for all our QProcess::start calls :/

valgrind: m_syswrap/syswrap-main.c:1957 (vgPlain_client_syscall): Assertion '0 == (sci->flags & ~(SfMayBlock | SfPostOnFail | SfPollAfter))' failed.

host stacktrace:
==1115104==    at 0x58041B2A: show_sched_status_wrk (m_libcassert.c:406)
==1115104==    by 0x58041C47: report_and_quit (m_libcassert.c:477)
==1115104==    by 0x58041DD7: vgPlain_assert_fail (m_libcassert.c:543)
==1115104==    by 0x5809AB4F: vgPlain_client_syscall (syswrap-main.c:1980)
==1115104==    by 0x5809617A: handle_syscall (scheduler.c:1208)
==1115104==    by 0x58098177: vgPlain_scheduler (scheduler.c:1526)
==1115104==    by 0x580E38D0: thread_wrapper (syswrap-linux.c:101)
==1115104==    by 0x580E38D0: run_a_thread_NORETURN (syswrap-linux.c:154)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable syscall 56 (lwpid 1115104)
==1115104==    at 0x83E2A9D: syscall (in /usr/lib/libc-2.33.so)
==1115104==    by 0x7E48636: forkfd (in /makefactory/products/master/release/linux64/a3_c_8918188/bin/libQt6Core.so.6)
==1115104==    by 0x7E2F881: QProcessPrivate::startProcess() (in /makefactory/products/master/release/linux64/a3_c_8918188/bin/libQt6Core.so.6)
==1115104==    by 0x7E2A4B9: QProcessPrivate::start(QFlags<QIODeviceBase::OpenModeFlag>) (in /makefactory/products/master/release/linux64/a3_c_8918188/bin/libQt6Core.so.6)
==1115104==    by 0x15E54F9: main (in /makefactory/products/master/release/linux64/a3_c_8918188/bin/a3c)
client stack range: [0x1FFEFE9000 0x1FFF000FFF] client SP: 0x1FFEFFE028
valgrind stack range: [0x10033B8000 0x10034B7FFF] top usage: 8328 of 1048576


valgrind version is valgrind-3.16.1, current Manjaro valgrind package.

Actually QProcess start code seems "normal", just like: (m_serverProcess == QProcess)


        // try to startup the server, output its sever to stdout/err
        m_serverProcess.setProcessChannelMode(QProcess::ForwardedChannels);
        const QString program = arguments.takeFirst();
        m_serverProcess.start(program, arguments, QProcess::ReadOnly);
        if (!m_serverProcess.waitForStarted()) {
            Ur::printf(Ur::Fatal, "Unable to start temporary server.");
        }

Interesting enough, if I skip this process startup call (that happens in our main shortly after the QApplication is created,
later calls that trigger more process starting work :/

I tried to avoid any special stuff we only do there, like the 

        m_serverProcess.setProcessChannelMode(QProcess::ForwardedChannels);

but that still happens.
Comment 3 Paul Floyd 2021-04-06 08:23:22 UTC
Can you reproduce the problem with Valgrind built from source with the default configure options? (Get the source from git or tarball then run "./autogen.sh ; ./configure ; gmake". Once built you can run Valgrind from the source/build directory using the ./vg-in-place script.
Comment 4 Christoph Cullmann 2021-04-06 08:31:30 UTC
Ok, just tried it, system is up-to-date Manjaro.

I downloaded now

https://sourceware.org/pub/valgrind/valgrind-3.17.0.tar.bz2

and do like you told below ;)

I think 3.17 fixes the issue for me, at least, my new "simple" test case:

1) open default kwrite of Manjaro
2) open file dialog

is now fine, before I got:

valgrind: m_syswrap/syswrap-main.c:1957 (vgPlain_client_syscall): Assertion '0 == (sci->flags & ~(SfMayBlock | SfPostOnFail | SfPollAfter))' failed.

host stacktrace:
==3711011==    at 0x58041B2A: show_sched_status_wrk (m_libcassert.c:406)
==3711011==    by 0x58041C47: report_and_quit (m_libcassert.c:477)
==3711011==    by 0x58041DD7: vgPlain_assert_fail (m_libcassert.c:543)
==3711011==    by 0x5809AB4F: vgPlain_client_syscall (syswrap-main.c:1980)
==3711011==    by 0x5809617A: handle_syscall (scheduler.c:1208)
==3711011==    by 0x58098177: vgPlain_scheduler (scheduler.c:1526)
==3711011==    by 0x580E38D0: thread_wrapper (syswrap-linux.c:101)
==3711011==    by 0x580E38D0: run_a_thread_NORETURN (syswrap-linux.c:154)

sched status:
  running_tid=1




Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.

==3711014== Warning: invalid file descriptor 4354 in syscall clone()

valgrind: m_syswrap/syswrap-main.c:1957 (vgPlain_client_syscall): Assertion '0 == (sci->flags & ~(SfMayBlock | SfPostOnFail | SfPollAfter))' failed.

host stacktrace:
==3711014==    at 0x58041B2A: show_sched_status_wrk (m_libcassert.c:406)
==3711014==    by 0x58041C47: report_and_quit (m_libcassert.c:477)
==3711014==    by 0x58041DD7: vgPlain_assert_fail (m_libcassert.c:543)
==3711014==    by 0x5809AB4F: vgPlain_client_syscall (syswrap-main.c:1980)
==3711014==    by 0x5809617A: handle_syscall (scheduler.c:1208)
==3711014==    by 0x58098177: vgPlain_scheduler (scheduler.c:1526)
==3711014==    by 0x580E38D0: thread_wrapper (syswrap-linux.c:101)
==3711014==    by 0x580E38D0: run_a_thread_NORETURN (syswrap-linux.c:154)

sched status:
  running_tid=1


now I see no such errors!


I only get:

==3711230== Syscall param waitid(infop) points to unaddressable byte(s)
==3711230==    at 0x65FAA9D: syscall (in /usr/lib/libc-2.33.so)
==3711230==    by 0x60172D7: ??? (in /usr/lib/libQt5Core.so.5.15.2)
==3711230==    by 0x5FFB2DB: ??? (in /usr/lib/libQt5Core.so.5.15.2)
==3711230==    by 0x6E9C4BB: ??? (in /usr/lib/libKF5KIOCore.so.5.80.0)
==3711230==    by 0x6E9C698: ??? (in /usr/lib/libKF5KIOCore.so.5.80.0)
==3711230==    by 0x6E9CE49: ??? (in /usr/lib/libKF5KIOCore.so.5.80.0)
==3711230==    by 0x6E9E8A9: ??? (in /usr/lib/libKF5KIOCore.so.5.80.0)
==3711230==    by 0x6E9EA5B: KSambaShare::KSambaShare() (in /usr/lib/libKF5KIOCore.so.5.80.0)
==3711230==    by 0x6E9EBB5: KSambaShare::instance() (in /usr/lib/libKF5KIOCore.so.5.80.0)
==3711230==    by 0x6EA65A6: KFileItem::overlays() const (in /usr/lib/libKF5KIOCore.so.5.80.0)
==3711230==    by 0x6DBAD6D: KDirModel::data(QModelIndex const&, int) const (in /usr/lib/libKF5KIOWidgets.so.5.80.0)
==3711230==    by 0x604F228: QSortFilterProxyModel::data(QModelIndex const&, int) const (in /usr/lib/libQt5Core.so.5.15.2)
==3711230==  Address 0x0 is not stack'd, malloc'd or (recently) free'd


but if I don't remember wrong, this is "intentional" that way in the code.
Comment 5 Christoph Cullmann 2021-04-06 08:31:59 UTC
I would wait for official 3.17 distro packages to try again, then I would close this, if it works there, too.

Is that ok for you?
Comment 6 Paul Floyd 2021-04-06 09:19:08 UTC
I don't know what the root cause is exactly of this issue. My suspicion is that it is related to the LTO used in producing the official Valgrind package. If that is the case then waiting for the next official release may not help. I'm not aware of any fix that went into 3.17.0 that affects this problem.

It would be helpful to try building Valgrind with the Manjaro package options and then retest. If this is the right site https://snapcraft.io/install/valgrind/manjaro then it looks like Manjaro is adding a patch for exp-failgrind and also configuring with --enable-lto.
Comment 7 Christoph Cullmann 2021-04-06 09:27:49 UTC
Hmm, I think this is the current pkgbuild file:

https://github.com/archlinux/svntogit-packages/blob/760e4b1e68f1880e01a6f4d425c988134dafe84b/trunk/PKGBUILD
Comment 8 Christoph Cullmann 2021-04-06 10:36:05 UTC
Hmm, you are right, if I use 3.16.1 + apply the one patch here to fix the compile https://raw.githubusercontent.com/archlinux/svntogit-packages/packages/valgrind/trunk/valgrind-3.16-openmpi-4.0.patch it still works :/

Guess the archlinux packages are just broken...
Comment 9 Christoph Cullmann 2021-04-06 12:14:31 UTC
I tried now my 6.1 Qt application again.

There with 3.16.1 I still get (even if self compiled):

valgrind: m_syswrap/syswrap-main.c:1957 (vgPlain_client_syscall): Assertion '0 == (sci->flags & ~(SfMayBlock | SfPostOnFail | SfPollAfter))' failed.

host stacktrace:
==301193==    at 0x58041B2A: show_sched_status_wrk (m_libcassert.c:406)
==301193==    by 0x58041C47: report_and_quit (m_libcassert.c:477)
==301193==    by 0x58041DD7: vgPlain_assert_fail (m_libcassert.c:543)
==301193==    by 0x5809AB4F: vgPlain_client_syscall (syswrap-main.c:1980)
==301193==    by 0x5809617A: handle_syscall (scheduler.c:1208)
==301193==    by 0x58098177: vgPlain_scheduler (scheduler.c:1526)
==301193==    by 0x580E38D0: thread_wrapper (syswrap-linux.c:101)
==301193==    by 0x580E38D0: run_a_thread_NORETURN (syswrap-linux.c:154)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable syscall 56 (lwpid 301193)
==301193==    at 0x83E9A9D: syscall (in /usr/lib/libc-2.33.so)
==301193==    by 0x7E5EF86: forkfd (in /makefactory/products/master/release/linux64/a3_c_8939174/bin/libQt6Core.so.6)
==301193==    by 0x7E461D1: QProcessPrivate::startProcess() (in /makefactory/products/master/release/linux64/a3_c_8939174/bin/libQt6Core.so.6)
==301193==    by 0x7E40E09: QProcessPrivate::start(QFlags<QIODeviceBase::OpenModeFlag>) (in /makefactory/products/master/release/linux64/a3_c_8939174/bin/libQt6Core.so.6)
==301193==    by 0x15E5E09: main (in /makefactory/products/master/release/linux64/a3_c_8939174/bin/a3c)
client stack range: [0x1FFEFE9000 0x1FFF000FFF] client SP: 0x1FFEFFE058
valgrind stack range: [0x10034B8000 0x10035B7FFF] top usage: 8328 of 1048576


And I still get it with 3.17, too, for the Qt 6.x version.

valgrind: m_syswrap/syswrap-main.c:1957 (vgPlain_client_syscall): Assertion '0 == (sci->flags & ~(SfMayBlock | SfPostOnFail | SfPollAfter))' failed.

host stacktrace:
==304423==    at 0x58041F1A: show_sched_status_wrk (m_libcassert.c:406)
==304423==    by 0x58042037: report_and_quit (m_libcassert.c:477)
==304423==    by 0x580421C7: vgPlain_assert_fail (m_libcassert.c:543)
==304423==    by 0x5809C76F: vgPlain_client_syscall (syswrap-main.c:1980)
==304423==    by 0x58097D1A: handle_syscall (scheduler.c:1208)
==304423==    by 0x58099D17: vgPlain_scheduler (scheduler.c:1526)
==304423==    by 0x580E8110: thread_wrapper (syswrap-linux.c:101)
==304423==    by 0x580E8110: run_a_thread_NORETURN (syswrap-linux.c:154)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable syscall 56 (lwpid 304423)
==304423==    at 0x83EEA9D: syscall (in /usr/lib/libc-2.33.so)
==304423==    by 0x7E63F86: forkfd (in /makefactory/products/master/release/linux64/a3_c_8939174/bin/libQt6Core.so.6)
==304423==    by 0x7E4B1D1: QProcessPrivate::startProcess() (in /makefactory/products/master/release/linux64/a3_c_8939174/bin/libQt6Core.so.6)
==304423==    by 0x7E45E09: QProcessPrivate::start(QFlags<QIODeviceBase::OpenModeFlag>) (in /makefactory/products/master/release/linux64/a3_c_8939174/bin/libQt6Core.so.6)
==304423==    by 0x15E5E09: main (in /makefactory/products/master/release/linux64/a3_c_8939174/bin/a3c)
client stack range: [0x1FFEFE9000 0x1FFF000FFF] client SP: 0x1FFEFFDFC8
valgrind stack range: [0x10034B8000 0x10035B7FFF] top usage: 8688 of 1048576


The KWrite test with Qt 5.15.2 does trigger for 3.16.1 this less obvious output:

--314055-- WARNING: unhandled amd64-linux syscall: 439
--314055-- You may be able to write your own handler.
--314055-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
--314055-- Nevertheless we consider this a bug.  Please report
--314055-- it at http://valgrind.org/support/bug_reports.html.
Comment 10 Christoph Cullmann 2021-04-06 12:28:36 UTC
Here is the full log of one 3.17.0 run, compiled without any extra flags/patches from the source tarball with the system gcc (GCC) 10.2.0 on Manjaro:

# disable PCRE jit
export QT_ENABLE_REGEXP_JIT=0

# run the stuff, dies on first QProcess:start call:

./valgrind /makefactory/products/master/release/linux64/a3_c_8939174/bin/a3c --temp-server                                                                                       
==381756== Memcheck, a memory error detector
==381756== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==381756== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==381756== Command: /makefactory/products/master/release/linux64/a3_c_8939174/bin/a3c --temp-server
==381756== 
==381756== Syscall param waitid(infop) points to unaddressable byte(s)
==381756==    at 0x83EEA9D: syscall (in /usr/lib/libc-2.33.so)
==381756==    by 0x7E64010: forkfd (in /makefactory/products/master/release/linux64/a3_c_8939174/bin/libQt6Core.so.6)
==381756==    by 0x7E4B1D1: QProcessPrivate::startProcess() (in /makefactory/products/master/release/linux64/a3_c_8939174/bin/libQt6Core.so.6)
==381756==    by 0x7E45E09: QProcessPrivate::start(QFlags<QIODeviceBase::OpenModeFlag>) (in /makefactory/products/master/release/linux64/a3_c_8939174/bin/libQt6Core.so.6)
==381756==    by 0x15E5E09: main (in /makefactory/products/master/release/linux64/a3_c_8939174/bin/a3c)
==381756==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==381756== 
==381796== Warning: invalid file descriptor 183061184 in syscall clone()

valgrind: m_syswrap/syswrap-main.c:1957 (vgPlain_client_syscall): Assertion '0 == (sci->flags & ~(SfMayBlock | SfPostOnFail | SfPollAfter))' failed.

host stacktrace:
==381796==    at 0x58041F1A: show_sched_status_wrk (m_libcassert.c:406)
==381796==    by 0x58042037: report_and_quit (m_libcassert.c:477)
==381796==    by 0x580421C7: vgPlain_assert_fail (m_libcassert.c:543)
==381796==    by 0x5809C76F: vgPlain_client_syscall (syswrap-main.c:1980)
==381796==    by 0x58097D1A: handle_syscall (scheduler.c:1208)
==381796==    by 0x58099D17: vgPlain_scheduler (scheduler.c:1526)
==381796==    by 0x580E8110: thread_wrapper (syswrap-linux.c:101)
==381796==    by 0x580E8110: run_a_thread_NORETURN (syswrap-linux.c:154)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable syscall 56 (lwpid 381796)
==381796==    at 0x83EEA9D: syscall (in /usr/lib/libc-2.33.so)
==381796==    by 0x7E63F86: forkfd (in /makefactory/products/master/release/linux64/a3_c_8939174/bin/libQt6Core.so.6)
==381796==    by 0x7E4B1D1: QProcessPrivate::startProcess() (in /makefactory/products/master/release/linux64/a3_c_8939174/bin/libQt6Core.so.6)
==381796==    by 0x7E45E09: QProcessPrivate::start(QFlags<QIODeviceBase::OpenModeFlag>) (in /makefactory/products/master/release/linux64/a3_c_8939174/bin/libQt6Core.so.6)
==381796==    by 0x15E5E09: main (in /makefactory/products/master/release/linux64/a3_c_8939174/bin/a3c)
client stack range: [0x1FFEFE9000 0x1FFF000FFF] client SP: 0x1FFEFFDDD8
valgrind stack range: [0x10034B8000 0x10035B7FFF] top usage: 8688 of 1048576


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.

Fatal: Unable to get port of temporary server before 600 seconds startup timeout.
==381756== 
==381756== HEAP SUMMARY:
==381756==     in use at exit: 4,185,731 bytes in 30,217 blocks
==381756==   total heap usage: 82,652 allocs, 52,435 frees, 172,765,610 bytes allocated
==381756== 
==381756== LEAK SUMMARY:
==381756==    definitely lost: 256 bytes in 1 blocks
==381756==    indirectly lost: 46 bytes in 2 blocks
==381756==      possibly lost: 336 bytes in 1 blocks
==381756==    still reachable: 4,185,093 bytes in 30,213 blocks
==381756==                       of which reachable via heuristic:
==381756==                         newarray           : 608 bytes in 4 blocks
==381756==         suppressed: 0 bytes in 0 blocks
==381756== Rerun with --leak-check=full to see details of leaked memory
==381756== 
==381756== For lists of detected and suppressed errors, rerun with: -s
==381756== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4)
Comment 11 Christoph Cullmann 2021-04-06 13:35:38 UTC
Created attachment 137380 [details]
Minimal Qt 6 example
Comment 12 Christoph Cullmann 2021-04-06 13:40:08 UTC
For better testing, I added a minimal Qt 6 example.

The tar contains the CMakeLists.txt + test.cpp.

(and an example compile against the system Qt 6.0.2 of Manjaro)

If I use the system 3.16.1 valgrind of Manjaro I get:

valgrind ./test
==490703== Memcheck, a memory error detector
==490703== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==490703== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==490703== Command: ./test
==490703== 
==490703== Syscall param waitid(infop) points to unaddressable byte(s)
==490703==    at 0x5274A9D: syscall (in /usr/lib/libc-2.33.so)
==490703==    by 0x4BDF99F: ??? (in /usr/lib/libQt6Core.so.6.0.2)
==490703==    by 0x4BC3BB3: ??? (in /usr/lib/libQt6Core.so.6.0.2)
==490703==    by 0x1092A5: main (in /home/cullmann/test/test)
==490703==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==490703== 
==490711== Warning: invalid file descriptor 4354 in syscall clone()

valgrind: m_syswrap/syswrap-main.c:1957 (vgPlain_client_syscall): Assertion '0 == (sci->flags & ~(SfMayBlock | SfPostOnFail | SfPollAfter))' failed.

host stacktrace:
==490711==    at 0x58041B2A: show_sched_status_wrk (m_libcassert.c:406)
==490711==    by 0x58041C47: report_and_quit (m_libcassert.c:477)
==490711==    by 0x58041DD7: vgPlain_assert_fail (m_libcassert.c:543)
==490711==    by 0x5809AB4F: vgPlain_client_syscall (syswrap-main.c:1980)
==490711==    by 0x5809617A: handle_syscall (scheduler.c:1208)
==490711==    by 0x58098177: vgPlain_scheduler (scheduler.c:1526)
==490711==    by 0x580E38D0: thread_wrapper (syswrap-linux.c:101)
==490711==    by 0x580E38D0: run_a_thread_NORETURN (syswrap-linux.c:154)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable syscall 56 (lwpid 490711)
==490711==    at 0x5274A9D: syscall (in /usr/lib/libc-2.33.so)
==490711==    by 0x4BDF91D: ??? (in /usr/lib/libQt6Core.so.6.0.2)
==490711==    by 0x4BC3BB3: ??? (in /usr/lib/libQt6Core.so.6.0.2)
==490711==    by 0x1092A5: main (in /home/cullmann/test/test)
client stack range: [0x1FFEFF6000 0x1FFF000FFF] client SP: 0x1FFEFFF988
valgrind stack range: [0x1002BAA000 0x1002CA9FFF] top usage: 13424 of 1048576


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.









If use a self-compiled 3.16.1 or 3.17.0 valgrind without any patches:

/tmp/testet/bin/valgrind ./test
==492054== Memcheck, a memory error detector
==492054== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==492054== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==492054== Command: ./test
==492054== 
==492054== Syscall param waitid(infop) points to unaddressable byte(s)
==492054==    at 0x5274A9D: syscall (in /usr/lib/libc-2.33.so)
==492054==    by 0x4BDF99F: ??? (in /usr/lib/libQt6Core.so.6.0.2)
==492054==    by 0x4BC3BB3: ??? (in /usr/lib/libQt6Core.so.6.0.2)
==492054==    by 0x1092A5: main (in /home/cullmann/test/test)
==492054==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==492054== 
==492054== 
==492054== HEAP SUMMARY:
==492054==     in use at exit: 19,244 bytes in 21 blocks
==492054==   total heap usage: 213 allocs, 192 frees, 152,055 bytes allocated

and

/makefactory/usr/heute/144702/release/linux64/bin/valgrind ./test
==491169== Memcheck, a memory error detector
==491169== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==491169== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==491169== Command: ./test
==491169== 
==491169== Syscall param waitid(infop) points to unaddressable byte(s)
==491169==    at 0x5279A9D: syscall (in /usr/lib/libc-2.33.so)
==491169==    by 0x4BE499F: ??? (in /usr/lib/libQt6Core.so.6.0.2)
==491169==    by 0x4BC8BB3: ??? (in /usr/lib/libQt6Core.so.6.0.2)
==491169==    by 0x1092A5: main (in /home/cullmann/test/test)
==491169==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==491169== 
==491169== 
==491169== HEAP SUMMARY:
==491169==     in use at exit: 19,244 bytes in 21 blocks
==491169==   total heap usage: 213 allocs, 192 frees, 152,055 bytes allocated
==491169== 




Strange enough, I don't run in the error I have seen below with the Qt 6.1 build we have.

Perhaps our 6.1 build uses a different code path in the forkfd stuff?
Comment 13 Christoph Cullmann 2021-04-06 13:42:52 UTC
Just tried the test example with our Qt 6.1 build:

/makefactory/usr/heute/144702/release/linux64/bin/valgrind ./test         
==494853== Memcheck, a memory error detector
==494853== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==494853== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==494853== Command: ./test
==494853== 
==494853== Syscall param waitid(infop) points to unaddressable byte(s)
==494853==    at 0x70FBA9D: syscall (in /usr/lib/libc-2.33.so)
==494853==    by 0x6953120: forkfd (in /local/ssd/cullmann/build/astreegui.default/usr/lib/libQt6Core.so.6.1.0)
==494853==    by 0x69383BE: QProcessPrivate::startProcess() (in /local/ssd/cullmann/build/astreegui.default/usr/lib/libQt6Core.so.6.1.0)
==494853==    by 0x69329C9: QProcessPrivate::start(QFlags<QIODeviceBase::OpenModeFlag>) (in /local/ssd/cullmann/build/astreegui.default/usr/lib/libQt6Core.so.6.1.0)
==494853==    by 0x10933E: main (in /home/cullmann/test/test)
==494853==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==494853== 
==494864== Warning: invalid file descriptor 119528384 in syscall clone()

valgrind: m_syswrap/syswrap-main.c:1957 (vgPlain_client_syscall): Assertion '0 == (sci->flags & ~(SfMayBlock | SfPostOnFail | SfPollAfter))' failed.

host stacktrace:
==494864==    at 0x580452D0: show_sched_status_wrk (m_libcassert.c:406)
==494864==    by 0x580453D7: report_and_quit (m_libcassert.c:477)
==494864==    by 0x5804555E: vgPlain_assert_fail (m_libcassert.c:543)
==494864==    by 0x580A409E: vgPlain_client_syscall (syswrap-main.c:1980)
==494864==    by 0x580A01BA: handle_syscall (scheduler.c:1208)
==494864==    by 0x580A1B10: vgPlain_scheduler (scheduler.c:1526)
==494864==    by 0x580EEE06: thread_wrapper (syswrap-linux.c:101)
==494864==    by 0x580EEE06: run_a_thread_NORETURN (syswrap-linux.c:154)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable syscall 56 (lwpid 494864)
==494864==    at 0x70FBA9D: syscall (in /usr/lib/libc-2.33.so)
==494864==    by 0x6953096: forkfd (in /local/ssd/cullmann/build/astreegui.default/usr/lib/libQt6Core.so.6.1.0)
==494864==    by 0x69383BE: QProcessPrivate::startProcess() (in /local/ssd/cullmann/build/astreegui.default/usr/lib/libQt6Core.so.6.1.0)
==494864==    by 0x69329C9: QProcessPrivate::start(QFlags<QIODeviceBase::OpenModeFlag>) (in /local/ssd/cullmann/build/astreegui.default/usr/lib/libQt6Core.so.6.1.0)
==494864==    by 0x10933E: main (in /home/cullmann/test/test)
client stack range: [0x1FFEFF6000 0x1FFF000FFF] client SP: 0x1FFEFFF878
valgrind stack range: [0x1002D8E000 0x1002E8DFFF] top usage: 13584 of 1048576






Either Qt 6.1 has there a different behavior in QProcess on its own or we compile Qt 6.1 differently then Manjaro regarding this :/
Comment 14 Tom Hughes 2021-04-06 13:57:35 UTC
Posting the same output over and over again doesn't really help anyone...

I'm not sure why this bug has the title it does - where is the evidence that this has anything to do with clone flags? Do we even know what clone flags are being used?

The initial issue appears to be this:

==494864== Warning: invalid file descriptor 119528384 in syscall clone()

which is then followed by a pre-fail in execve that has the flags in a bad state causing an assertion - that is probably a bug in valgrind but it doesn't have anything to with clone other than possibly being triggered by a previous failing clone.

The only place I can see where clone deals in FDs is when CLONE_PIDFD is used in which case we expect it to return a valid file descriptor if I'm reading things right.

Do we know if Qt uses CLONE_PIDFD? or can we get a run with --trace-syscalls=yes so we can see what flags it passes?
Comment 15 Christoph Cullmann 2021-04-06 14:06:13 UTC
Sorry, didn't want to spam you.

I will attach a --trace-syscalls=yes ./test log for both the working 6.0 and non-working 6.1 run.
Comment 16 Christoph Cullmann 2021-04-06 14:06:32 UTC
Created attachment 137383 [details]
valgrind 3.17.0 run with Qt 6.0
Comment 17 Christoph Cullmann 2021-04-06 14:06:54 UTC
Created attachment 137384 [details]
valgrind 3.17.0 run with Qt 6.1
Comment 18 Thiago Macieira 2021-04-06 14:14:40 UTC
Yes, Qt uses CLONE_PIDFD. It was the first user of this feature (the feature was written from our prompting and design). The flag has changed slightly since the first time this bug was reported. It is now CLONE_PIDFD | SIGCHLD (the SIGCHLD was missing in the original version).
Comment 19 Christoph Cullmann 2021-04-06 14:50:41 UTC
I compiled now the 6.1 branch locally with

-DFEATURE_forkfd_pidfd=OFF

That triggers (at least if I don't read the Qt sources wrongly)

    int ffdflags = FFD_CLOEXEC;

    // QTBUG-86285
#if !QT_CONFIG(forkfd_pidfd)
    ffdflags |= FFD_USE_FORK;
#endif

    pid_t childPid;
    forkfd = ::forkfd(ffdflags , &childPid);

For me, this "solves" the issues for all valgrind versions I have (Manjaro 3.16.1, vanilla 3.16.1 and 3.17.0)

Naturally I assume that is not the "proper" fix.
Comment 20 Thiago Macieira 2021-04-06 15:31:33 UTC
(In reply to Christoph Cullmann from comment #19)
> I compiled now the 6.1 branch locally with
> 
> -DFEATURE_forkfd_pidfd=OFF
> 
> That triggers (at least if I don't read the Qt sources wrongly)
> 
>     int ffdflags = FFD_CLOEXEC;
> 
>     // QTBUG-86285
> #if !QT_CONFIG(forkfd_pidfd)
>     ffdflags |= FFD_USE_FORK;
> #endif

> Naturally I assume that is not the "proper" fix.

No. That forces Qt to the old and pre-Linux 5.4 implementation that uses fork() directly (which is a clone() call with only the flag SIGCHLD). Since it's not using CLONE_PIDFD, it doesn't trigger whatever the issue in Valgrind is.
Comment 21 Tom Hughes 2021-04-06 16:11:32 UTC
Hmm.. It doesn't seem to make a huge amount of sense. Both versions of Qt are using PIDFD but with 6.1 the FD we get from the kernel seems to be a silly number that is not a valid file descriptor triggering everything which follows.

The final failure is because we set SfYieldAfter in the flags even though we have syntheised an EMFILE error for the invalid descriptor. The actual kernel clone has happened though so it's not clear that we can synthesise an error in POST like that though - it's all a bit tricky.

None of that explains the FD we are getting from the kernel though, or the difference between Qt versions. I wonder if the kernel is not writing it at all and the value we are seeing is what happened to be in that location before the call.

Can you get an strace of both versions running under valgrind so we can see how the kernel level clones compare?
Comment 22 Christoph Cullmann 2021-04-06 16:28:34 UTC
(In reply to Tom Hughes from comment #21)
> Hmm.. It doesn't seem to make a huge amount of sense. Both versions of Qt
> are using PIDFD but with 6.1 the FD we get from the kernel seems to be a
> silly number that is not a valid file descriptor triggering everything which
> follows.
> 
> The final failure is because we set SfYieldAfter in the flags even though we
> have syntheised an EMFILE error for the invalid descriptor. The actual
> kernel clone has happened though so it's not clear that we can synthesise an
> error in POST like that though - it's all a bit tricky.
> 
> None of that explains the FD we are getting from the kernel though, or the
> difference between Qt versions. I wonder if the kernel is not writing it at
> all and the value we are seeing is what happened to be in that location
> before the call.
> 
> Can you get an strace of both versions running under valgrind so we can see
> how the kernel level clones compare?

I will provide that.

Thanks that you take your time into looking into this at all.
I really hope I didn't just screw up the Qt compile we have, thought without valgrind,
it seems to work properly with all our regression tests.

Will attach the 2 logs for 6.0 and 6.1 with valgrind 3.17 vanilla.
Comment 23 Christoph Cullmann 2021-04-06 16:33:44 UTC
Created attachment 137386 [details]
strace for "successful" 6.0 run
Comment 24 Christoph Cullmann 2021-04-06 16:33:56 UTC
Created attachment 137387 [details]
strace for "failing" 6.1 run
Comment 25 Christoph Cullmann 2021-04-06 16:42:59 UTC
As additional info that might be useful:

Before we ported our stuff to Qt 6.x, we did use Qt 5.15 (not with all patches).

There we never had such issues, below is the diff of the 3rdparty/forkfd directories between the Qt 5.15 and 6.x version now in use here:

diff -u -r -w ../../qt5/src/qtbase/src/3rdparty/forkfd/forkfd.c src/qtbase/src/3rdparty/forkfd/forkfd.c
--- ../../qt5/src/qtbase/src/3rdparty/forkfd/forkfd.c   2020-10-06 17:17:50.510596533 +0200
+++ src/qtbase/src/3rdparty/forkfd/forkfd.c     2021-04-01 10:53:12.909416546 +0200
@@ -240,6 +240,9 @@
     }
 }
 
+#ifdef __GNUC__
+__attribute__((unused))
+#endif
 static int convertForkfdWaitFlagsToWaitFlags(int ffdoptions)
 {
     int woptions = WEXITED;
@@ -617,12 +620,6 @@
  * fork(), such as not calling the functions registered with pthread_atfork().
  * If that's necessary, pass this flag.
  *
- * @li @c FFD_VFORK_SEMANTICS Tell forkfd() to use semantics similar to
- * vfork(), if that's available. For example, on Linux with pidfd support
- * available, this will add the CLONE_VFORK option. On most other systems,
- * including Linux without pidfd support, this option does nothing, as using
- * the actual vfork() system call would cause a race condition.
- *
  * The file descriptor returned by forkfd() supports the following operations:
  *
  * @li read(2) When the child process exits, then the buffer supplied to
diff -u -r -w ../../qt5/src/qtbase/src/3rdparty/forkfd/forkfd.h src/qtbase/src/3rdparty/forkfd/forkfd.h
--- ../../qt5/src/qtbase/src/3rdparty/forkfd/forkfd.h   2020-10-06 17:17:50.510596533 +0200
+++ src/qtbase/src/3rdparty/forkfd/forkfd.h     2021-04-01 10:53:12.909416546 +0200
@@ -41,7 +41,6 @@
 #define FFD_CLOEXEC             1
 #define FFD_NONBLOCK            2
 #define FFD_USE_FORK            4
-#define FFD_VFORK_SEMANTICS     8
 
 #define FFD_CHILD_PROCESS (-2)
 
diff -u -r -w ../../qt5/src/qtbase/src/3rdparty/forkfd/forkfd_linux.c src/qtbase/src/3rdparty/forkfd/forkfd_linux.c
--- ../../qt5/src/qtbase/src/3rdparty/forkfd/forkfd_linux.c     2020-10-06 17:17:50.510596533 +0200
+++ src/qtbase/src/3rdparty/forkfd/forkfd_linux.c       2021-04-01 10:53:12.909416546 +0200
@@ -82,7 +82,7 @@
     return syscall(__NR_clone, cloneflags, child_stack, stack_size, ptid, newtls, ctid);
 #elif defined(__arc__) || defined(__arm__) || defined(__aarch64__) || defined(__mips__) || \
     defined(__nds32__) || defined(__hppa__) || defined(__powerpc__) || defined(__i386__) || \
-    defined(__x86_64__) || defined(__xtensa__) || defined(__alpha__)
+    defined(__x86_64__) || defined(__xtensa__) || defined(__alpha__) || defined(__riscv)
     /* ctid and newtls are inverted on CONFIG_CLONE_BACKWARDS architectures,
      * but since both values are 0, there's no harm. */
     return syscall(__NR_clone, cloneflags, child_stack, ptid, ctid, newtls);
@@ -147,10 +147,10 @@
     }
 
     *system = 1;
-    unsigned long cloneflags = CLONE_PIDFD;
-    if (flags & FFD_VFORK_SEMANTICS)
-        cloneflags |= CLONE_VFORK;
+    unsigned long cloneflags = CLONE_PIDFD | SIGCHLD;
     pid = sys_clone(cloneflags, &pidfd);
+    if (pid < 0)
+        return pid;
     if (ppid)
         *ppid = pid;
 
@@ -173,7 +173,7 @@
 {
     siginfo_t si;
     int ret;
-    int options = __WALL | convertForkfdWaitFlagsToWaitFlags(ffdoptions);
+    int options = convertForkfdWaitFlagsToWaitFlags(ffdoptions);
 
     if ((options & WNOHANG) == 0) {
         /* check if the file descriptor is non-blocking */
Comment 26 Thiago Macieira 2021-04-06 16:45:17 UTC
That diff shows the original mistake in calling clone() without SIGCHLD.
Comment 27 Tom Hughes 2021-04-06 17:00:18 UTC
Those both look correct in the strace - we have:

clone(child_stack=NULL, flags=CLONE_PIDFD|SIGCHLD, parent_tid=[12]) = 8063

and:

clone(child_stack=NULL, flags=CLONE_PIDFD|SIGCHLD, parent_tid=[12]) = 8050

but when valgrind tries to get the FD from parent_tid it is not getting that value so something is wrong there.
Comment 28 Tom Hughes 2021-04-06 17:37:18 UTC
I think I know what's happening - the POST handler for clone is actually running in both threads.

It works in the parent thread but when it tries to read the PIDFD in the child thread it gets a nonsense value. Possibly Qt 6.1 changed something that affects the value of that memory going in but my reading of the manual page says that the value is only guaranteed to be available in the parent's memory on return so it may just be chance that this appeared work before.

Try building valgrind with this patch and see if it helps:


diff --git a/coregrind/m_syswrap/syswrap-linux.c b/coregrind/m_syswrap/syswrap-linux.c
index 5ae4e6613..c59d8ee26 100644
--- a/coregrind/m_syswrap/syswrap-linux.c
+++ b/coregrind/m_syswrap/syswrap-linux.c
@@ -940,7 +940,7 @@ PRE(sys_clone)
          ("Valgrind does not support general clone().");
    }
 
-   if (SUCCESS) {
+   if (SUCCESS && RES != 0) {
       if (ARG_FLAGS & (VKI_CLONE_PARENT_SETTID | VKI_CLONE_PIDFD))
          POST_MEM_WRITE(ARG3, sizeof(Int));
       if (ARG_FLAGS & (VKI_CLONE_CHILD_SETTID | VKI_CLONE_CHILD_CLEARTID))
Comment 29 Christoph Cullmann 2021-04-06 17:46:14 UTC
Yeah ;=)

./vg-in-place /home/cullmann/test/test                                                                                                 master 
==39129== Memcheck, a memory error detector
==39129== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==39129== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==39129== Command: /home/cullmann/test/test
==39129== 
==39129== Syscall param waitid(infop) points to unaddressable byte(s)
==39129==    at 0x70FBA9D: syscall (in /usr/lib/libc-2.33.so)
==39129==    by 0x6953120: forkfd (in /local/ssd/cullmann/build/astreegui.default/usr/lib/libQt6Core.so.6.1.0)
==39129==    by 0x69383BE: QProcessPrivate::startProcess() (in /local/ssd/cullmann/build/astreegui.default/usr/lib/libQt6Core.so.6.1.0)
==39129==    by 0x69329C9: QProcessPrivate::start(QFlags<QIODeviceBase::OpenModeFlag>) (in /local/ssd/cullmann/build/astreegui.default/usr/lib/libQt6Core.so.6.1.0)
==39129==    by 0x10933E: main (in /home/cullmann/test/test)
==39129==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==39129== 
==39129== 
==39129== HEAP SUMMARY:
==39129==     in use at exit: 0 bytes in 0 blocks
==39129==   total heap usage: 174 allocs, 174 frees, 105,464 bytes allocated
==39129== 
==39129== All heap blocks were freed -- no leaks are possible
==39129== 
==39129== For lists of detected and suppressed errors, rerun with: -s
==39129== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Beside this message, now all works nicely!
Thanks a lot.

I tried the larger AbsInt tools, too, seems fine here!

I hope this can be included in some patch release of 3.17.x, this will plague a lot of people, as Qt 6.x will be more often used in the near future.
(and I think it should fix the 5.15.2 issues, too, given that has the same code more or less)

Thanks all for the help here!
Comment 30 Paul Floyd 2021-04-06 17:58:43 UTC
(In reply to Tom Hughes from comment #28)
> I think I know what's happening - the POST handler for clone is actually
> running in both threads.
> 
> It works in the parent thread but when it tries to read the PIDFD in the
> child thread it gets a nonsense value.

And I guess that the nonsense value could depend on the build options of Valgrind, like LTO.
Comment 31 Christoph Cullmann 2021-04-06 18:15:00 UTC
(In reply to Paul Floyd from comment #30)
> (In reply to Tom Hughes from comment #28)
> > I think I know what's happening - the POST handler for clone is actually
> > running in both threads.
> > 
> > It works in the parent thread but when it tries to read the PIDFD in the
> > child thread it gets a nonsense value.
> 
> And I guess that the nonsense value could depend on the build options of
> Valgrind, like LTO.

I think it depends more on the state of the memory of the process that does the clones. In our AbsInt tools, out of all QProcess calls (and we have a lot), just one in one of our tools always had this issue, all others just passed nicely (even in that tool, if one skipped the first one).

Nice that this is now a thing of the past, will make my debugging experience again a lot nicer, for the KDE stuff I work on, too.
Comment 32 Tom Hughes 2021-04-06 21:50:04 UTC
I've pushed that fix now as e08a82991a9b9dc87c13f2b89273f25f97d14baf.