Bug 364779 - libutempter as used in KPty makes applications hang if child process exits immediately
Summary: libutempter as used in KPty makes applications hang if child process exits im...
Status: RESOLVED FIXED
Alias: None
Product: frameworks-kpty
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: 5.24.0
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: kdelibs bugs
URL:
Keywords:
: 366712 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-06-26 13:00 UTC by Rex Dieter
Modified: 2022-09-04 10:24 UTC (History)
6 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Rex Dieter 2016-06-26 13:00:09 UTC
Konsole becomes unresponsive when a new tab is opened and current directory has been deleted.

Reproducible: Always

Steps to Reproduce:
1. Open a new konsole instance.
2. In the terminal session, do:
   mkdir test_directory
   cd test_directory
   rmdir ../test_directory
3. Create a new tab (e.g. via Ctrl-Shift T)

Actual Results:  
konsole becomes completely unresponsive and has to be killed to recover.


Expected Results:  
konsole should continue to operate normally.


See also downstream bug,
https://bugzilla.redhat.com/show_bug.cgi?id=1350105
Comment 1 Rex Dieter 2016-06-26 13:02:15 UTC
Marking confirmed, I can reproduce as well as original downstream reporter.
Comment 2 Martin Sandsmark 2016-08-13 11:32:52 UTC
*** Bug 366712 has been marked as a duplicate of this bug. ***
Comment 3 Martin Sandsmark 2016-08-13 11:36:43 UTC
The bug report I just marked as a duplicate of this one has a good backtrace: https://bugsfiles.kde.org/attachment.cgi?id=100570

From my comments on the other bug; this seems to be a general problem with QProcess, possibly with forkfd. Even running waitForStarted(1) doesn't help.

The state() of the process just after calling start() is QProcess::Starting, and no errors are set.
Comment 4 Martin Sandsmark 2016-08-13 11:37:21 UTC
Adding thiago here as well, as he knows forkfd best.
Comment 5 Thiago Macieira 2016-08-13 17:07:53 UTC
It's not forkfd. That has nothing to do with directories. The backtrace doesn't even show it.

Can anyone show me what the child process is doing? A backtrace would help, but an strace even more.
Comment 6 Thiago Macieira 2016-08-13 17:15:36 UTC
I see the forkfd_wait now. Since I can't see anything obviously wrong with the QProcess code either, I'm going to guess this is caused by the process failing to start too quickly.

But there's a unit for QProcess for this exact condition and it's passing.
Comment 7 Thiago Macieira 2016-08-13 17:52:36 UTC
Ok, I could reproduce this locally and I've got an strace:

This is forkfd creating the child process:
115218 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fd10015bc10) = 115247
115247 set_robust_list(0x7fd10015bc20, 24 <unfinished ...>
115218 write(28, "*\0\0\0\0\0\0\0", 8 <unfinished ...>
115247 <... set_robust_list resumed> )  = 0
115218 <... write resumed> )            = 8
115218 close(28)                        = 0
115247 read(28, "*\0\0\0\0\0\0\0", 8)   = 8

The sequence of reads and writes synchronise parent and child, so we're sure that the entries in the forkfd global is fine.

The child process exits and the parent reads the error message from chdir:
115247 exit_group(-1)                   = ?
115218 write(6, "\1\0\0\0\0\0\0\0", 8)  = 8
115218 pselect6(25, [24], NULL, NULL, {30, 0}, {NULL, 8}) = 1 (in [24], left {29, 999993180})
115218 write(6, "\1\0\0\0\0\0\0\0", 8)  = 8
115218 read(24, "N\0o\0 \0s\0u\0c\0h\0 \0f\0i\0l\0e\0 \0o\0r\0 \0"..., 1024) = 50
115218 write(6, "\1\0\0\0\0\0\0\0", 8)  = 8
115218 close(24)                        = 0

Note how you can tell which version of Qt this is from the strace :-)

The code above is in QProcess (QProcessPrivate::_q_startupNotification and QProcessPrivate::processStarted). As a result, the QProcess change state to QProcess::NotRunning and emits error() with QProcess::FailedToStart.

Immediately after this, something weird happens:
115218 rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTORER|SA_RESTART, 0x7fd0ffa019f0}, {0x7fd0fbfac190, [], SA_RESTORER|SA_NOCLDSTOP, 0x7fd0ffa019f0}, 8) = 0
115218 clone( <unfinished ...>
115247 +++ exited with 255 +++
115218 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fd10015bc10) = ? ERESTARTNOINTR (To be restarted)
115218 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=115247, si_uid=1000, si_status=255, si_utime=0, si_stime=0} ---
115218 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fd10015bc10) = 115249
115249 set_robust_list(0x7fd10015bc20, 24) = 0
115218 wait4(115249,  <unfinished ...>
115249 dup2(13, 0)                      = 0
115249 dup2(13, 1)                      = 1
115249 execve("/usr/lib/utempter/utempter", ["/usr/lib/utempter/utempter", "del"], [/* 119 vars */]) = 0

The parent process replaces the SIGCHLD handler: I can tell it's not forkfd because it's missing the SA_NOCLDSTOP flag. Immediately thereafter, the child process that failed to chdir() exits and its SIGCHLD is notified. But forkfd's signal handler was never called, so it never writes to the pipe.

The bug is not in forkfd. It's in konsole for replacing the signal handler at the most inopportune moment. sigaction() is NOT async-signal-safe. DO NOT call it if there's a chance that the signal you're looking for will be delivered at that exact time.

After all of this happens, utempter exits and the signal handler is restored:
115249 exit_group(1)                    = ?
115249 +++ exited with 1 +++
115218 <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 115249
115218 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=115249, si_uid=1000, si_status=1, si_utime=0, si_stime=0} ---
115218 rt_sigaction(SIGCHLD, {0x7fd0fbfac190, [], SA_RESTORER|SA_NOCLDSTOP, 0x7fd0ffa019f0}, NULL, 8) = 0
115218 read(26,  <unfinished ...>

Note the presence of SA_NOCLDSTOP. That read() never returns, so it's the same as the backtrace, stuck inside QProcessPrivate::waitForDeadChild. Conclusion: starting utempter was done as a response to QProcess changing state or emitting the error() signal. It's not a thread issue, it's just Konsole doing the one thing that it shouldn't have done in response to a failure to start: replace the SIGCHLD handler.
Comment 8 Martin Sandsmark 2016-08-13 19:10:40 UTC
From discussion on IRC: The problem is apparently KPtyProcess calling removeLineFromUtmp() in utempter, which calls sigaction() for forking its child.
Comment 9 Martin Sandsmark 2016-08-21 10:49:30 UTC
One way to "solve" it is to change kptyprocess to not call utmp-stuff during the onstatechange-stuff and instead do it only on finished().

But in this case it still breaks if another process exits at the wrong time. So the only place to properly fix it is in utempter, if I understand correctly, or do the utempter stuff ourselves with QProcess instead of using the utempter library.
Comment 10 Martin Sandsmark 2016-08-28 13:13:52 UTC
https://git.reviewboard.kde.org/r/128790/
Comment 11 Martin Sandsmark 2016-10-01 12:59:37 UTC
Git commit 65a688be0b2deaf58414e1f19601a15554beb143 by Martin T. H. Sandsmark.
Committed on 01/10/2016 at 10:42.
Pushed by sandsmark into branch 'master'.

Call the utempter helper executable manually

According to the investigation in
https://bugs.kde.org/show_bug.cgi?id=364779 utempter does stuff in a way
that isn't compatible with QProcess/KProcess/KPtyProcess (calling
sigaction() before launching its child process).

So instead we invoke it manually with QProcess.
REVIEW: 128790

M  +12   -5    cmake/FindUTEMPTER.cmake
M  +0    -5    src/CMakeLists.txt
M  +43   -8    src/kpty.cpp
M  +1    -0    src/kpty_p.h

http://commits.kde.org/kpty/65a688be0b2deaf58414e1f19601a15554beb143