Bug 149932 - terminal apps inside konsole/openbsd sometimes encounter unexpected EAGAIN errors
Summary: terminal apps inside konsole/openbsd sometimes encounter unexpected EAGAIN er...
Status: RESOLVED REMIND
Alias: None
Product: konsole
Classification: Applications
Component: general (show other bugs)
Version: 1.6.6
Platform: OpenBSD OpenBSD
: NOR normal
Target Milestone: ---
Assignee: Konsole Developer
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-09-17 04:40 UTC by xkernigh
Modified: 2009-05-07 17:23 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description xkernigh 2007-09-17 04:40:32 UTC
Version:           Konsole 1.6.6 (using KDE KDE 3.5.6)
Installed from:    OpenBSD Packages
Compiler:          gcc (GCC) 3.3.5 (propolice) system compiler /usr/bin/gcc
OS:                OpenBSD

Sometimes, when a terminal app running in Konsole accesses the tty device, the read(2) or write(2) system call unexpectedly returns -1 with errno EAGAIN. (OpenBSD has EAGAIN == EWOULDBLOCK.) The tty device acts as non-blocking when it should be blocking. This problem only seems to happen when the operating system is OpenBSD; read this bug report to the end.

It seems that Konsole sometimes sets the slave file descriptor to non-blocking. The terminal app (which inherits its standard in/out/err file descriptors from the shell, which inherits them from Konsole) does not expect this change, which causes read(2)/write(2) to return EAGAIN instead of blocking.

Typically, the app confuses the read error with end-of-file and then unexpectedly quit; or the app ignores the write error and fails draw itself incorrectly (and the app's redraw command does not fix the problem). Apps that may encounter this problem include at least less(1), script(1), ttyrec, irssi.

The problem does not happen immediately. I can open a new shell in Konsole, and run (for example) irssi from the shell, and irssi starts normally. Eventually, irssi will encounter drawing problems and I will have to quit and restart irssi.

EVIDENCE OF THIS BUG

This bug was especially difficult to encounter and identify. I am not an OpenBSD developer or a KDE developer; I am only a source-diver.

When I encountered the mystery of my malfunctioning terminal apps, I had to create a test program to identify the cause.

The script(1) command interpositions itself between Konsole and the app recorded by script(1). I modified script(1) to log any EAGAIN errors to /tmp/wouldscript-PID.log, then to poll(2) the fd with a time out of one second before retrying the read(2)/write(2) call. I refer to my modified program as "wouldscript". A "svn co http://opensvn.csie.org/kernigh/trunk/wouldscript" will give you a copy, or you may use the source browser at https://opensvn.csie.org/traccgi/kernigh/browser/trunk/wouldscript

Now I can open a new shell in Konsole, start wouldscript (which starts another shell), then start irssi. Konsole sends my input to wouldscript, which passes it to irssi; and irssi writes its output to wouldscript, which passes it to Konsole; but wouldscript acts as insulation, and handles all the EAGAIN errors so that irssi never sees them.

At first, wouldscript logs no EAGAIN errors. Eventually, it will start logging an EAGAIN error, once every second, as it waits for my input. It logs more EAGAIN errors if I type any input or if irssi draws output.

LIST OF WORKAROUNDS

Any one of these four list items is a workaround for this bug.

(1) Use wouldscript to insulate the app from Konsole.

(2) Use xterm instead of Konsole. Apps running in xterm do not encounter problems; wouldscript running in xterm has not yet logged any EAGAIN errors. This is how I know that this bug is only in Konsole. (The problem with this workaround is that xterm misses some features that I like in Konsole!)

(3) Whenever an app quits unexpectedly or draws incorrectly, quit and restart the app. When one arrives at the shell prompt to restart the app, the bug goes away, see REPRODUCING section below. (If the app is irssi, then I leave and rejoin each IRC channel. If the app is ttyrec, then ttyrec will fall off Konsole but continue running, and I will have to send a SIGHUP to whatever ttyrec is recording. If ttyrec is recording NetHack or ToME, then sending SIGHUP will trigger a panic save, allowing me to continue my roguelike game by starting NetHack or ToME again.)

(4) Open /dev/tty to create a new fd, and use this fd instead of any fd that we inherit or dup from Konsole. I have created a simple C program called "wouldtty" (in the same directory as "wouldscript") that does this. As the OpenBSD dup(2) manual page explains, an open(2) call produces a "different object reference".

Workaround (4) works perfectly; I can install wouldtty into my PATH, then configure Konsole's Shell sessions to run the wouldtty command.

REPRODUCING THE BUG

Unfortunately this bug is not easy to reproduce. Sometimes I have to wait for EAGAIN errors to begin, because I do not know how to cause them to come sooner.

I experience this bug while running OpenBSD 4.1, and with KDE 3.5.6 installed from OpenBSD packages. I have experienced this bug with older versions of OpenBSD and KDE. Note that the kdelibs from OpenBSD packages includes a patch that uses openpty(3) to access the pty, so that permissions are correct. That good patch seems unrelated to this bug. That patch is at http://xrl.us/goodptypatch

The easiest way to reproduce the bug is to use irssi. Open a new Shell session with Konsole. From the shell, start irssi. Join a busy IRC channel (I use #nethack in Freenode). Start scrolling or switching windows, to generate a lot of input and output for irssi. (To scroll in irssi, press page-up and page-down. To switch windows, press Escape 1, Escape 2, ...) If the bug does not happen, then leave irssi running for some time, and try some more scrolling or window-switching later. The symptom of the bug will be a messed display in irssi. (Or when running irssi inside wouldscript, the symptom is wouldscript logging EAGAIN errors.)

Another way to reproduce the bug is to open a large text file in less(1). Try "man termios". Perform scrolls and jumps. The symptom of the bug will be a messed display, and pressing 'r' for redraw will not fix the problem. Sometimes, one can make the bug go away temporarily by scrolling up a lot.

While trying to reproduce the bug, avoid shell prompts. After starting irssi or less(1), do not suspend the app and go back to the shell. The reason for this is that shells like ksh and mksh will reset the standard input (the tty device) to blocking io at every shell prompt. In fact, one can use Konsole as the day-to-day terminal emulator for running shell commands, and never encounter this bug. This may explain why others failed to notice this bug before I.

ISOLATING THE BUG

I believe that Konsole sometimes calls fcntl(2) with the slave fd and sets the O_NONBLOCK flag. This affects all apps that have inherited or that fd from Konsole, and causes the unexpected EAGAIN errors. I have not isolated where is the fcntl call in Konsole.

However, I do know that Konsole (like all multithreaded Qt apps) links against the OpenBSD pthreads library. The pthreads library implements threads in user space. In particular, pthreads sets all file descriptors to non-blocking, then hides this fact by overriding several C library functions including open(2), fcntl(2), read(2), write(2). When a pthreads app like Konsole should be blocking, pthreads tries to switch to other threads.

It is possible that pthreads is preempting Konsole and setting the slave fd to nonblocking. However, I have no way to test this, nor have I identified a particular fcntl call in the pthreads source code that would cause this. This theory would explain why I never seem to encounter this bug with other operating systems. The FreeBSD and NetBSD pthreads libraries use a kernel feature called scheduler activations, instead of using non-blocking io with pretend blocking io like OpenBSD. The Linux pthreads library uses one kernel process per thread.

SUGGESTED FIX

I do not know how to fix this bug. Maybe document workaround (4) above for OpenBSD users who need to run curses apps in Konsole?

--Kernigh

Postscript: My computer is OpenBSD/macppc 4.1, my TERM environment variable is "xterm-xfree86".
Comment 1 Robert Knight 2008-03-18 03:24:08 UTC
I do not have the means to investigate this myself but the communication with the terminal is done by a completely new stack of code in KDE 4.1 (not KDE 4.0) so it is quite possible that the problem no longer occurs.

> I have not isolated where is the fcntl call in Konsole. 

Konsole itself does very little low-level work, except setting the odd pty property.  If you are looking for system calls, they are most likely to be found in kdelibs, specifically the KPty / KProcess code.
Comment 2 Kurt Hindenburg 2009-05-07 17:23:14 UTC
KDE3 is not supported.  Reopen if you can verify this issue with a late KDE4 version.