Bug 317267 - Kwin stops when GPU hangs (intel_do_flush_locked failed).
Summary: Kwin stops when GPU hangs (intel_do_flush_locked failed).
Status: RESOLVED DUPLICATE of bug 307348
Alias: None
Product: kwin
Classification: Plasma
Component: general (show other bugs)
Version: 4.10.1
Platform: Gentoo Packages Linux
: NOR normal
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-03-24 11:46 UTC by Detlev Casanova
Modified: 2013-03-24 20:08 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Detlev Casanova 2013-03-24 11:46:07 UTC
Hello,

There is a problem somewher in that chain :
I play a video file in VLC (format mt2 : MPEG2) this file is somehow corrupted as it has been captured with a cheap antenna.
After a moment reading the corrupted part, VLC hangs.

Dmesg tells me this :
[drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[drm:init_ring_common] *ERROR* failed to set render ring head to zero ctl 00000000 head 66e07b7c tail 00000000 start 00003000
[drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 66e07b7c tail 00000000 start 00003000
[drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[drm:i915_reset] *ERROR* Failed to reset chip.

At that moment, kwin stops (it does not crash, it stops and returns 1) with that message:
intel_do_flush_locked failed: I/O Error

Kwin yelled also lots of messages like this:
KWin::TopLevel::createWindowPixmap: Creating window pixmap failed id=<VLC window id> (deleted).

After that, impossible to restart kwin, the kernel has to be restarted.

I attach the (quite large) /sys/kernel/debug/dri/0/i915_error_state file.
I also have the file and the time in the file where it fails and it fails each time but the video file is 5.5 Gio, I'll have to cut a part of it if you need it.

I also realize that it might not be kwin's fault at all but it should not close anyway.

Reproducible: Always

Steps to Reproduce:
1. Read corrupted file with vlc
2. 
3.
Actual Results:  
kwin closes

Expected Results:  
kwin should not close
Comment 1 Martin Flöser 2013-03-24 12:03:15 UTC
> I also realize that it might not be kwin's fault at all but it should not close anyway.
yes and the driver should not hang. Sorry there is nothing we can do about that. It's a problem in a different layer which is completely abstracted away from us. We don't see that, we cannot detect it.
Comment 2 Thomas Lübking 2013-03-24 12:05:00 UTC
a) fyi, SIGSTOP cannot be intercepted - it's likely X11 or the kernel which stops kwin and kwin cannot prevent that in any way.

b) sounds like https://bugs.freedesktop.org/show_bug.cgi?id=57805

c) SNA or UXA?

d) Is it even impossible to restart kwin with compositing disabled?
d.1) Is there a kwin zombie process keeping around?
Comment 3 Detlev Casanova 2013-03-24 13:50:01 UTC
a) I see
b) Looks like it yes but should be fixed in the driver version I'm using.
c) SNA, could try uxa though
d) I'll try that later, I can't do that now.
Comment 4 Detlev Casanova 2013-03-24 17:35:49 UTC
It starts correctly with compositing disabled, as expected but there is no zombie process...
Comment 5 Thomas Lübking 2013-03-24 17:45:01 UTC
"s/but/and/" - the kernel module just got itself jammed.

Two things i forgot:

1. only with vlc or also with eg. mplayer (could just be due to the video sink; mplayer allows you to select out of a whole bunch with the -vo switch; could be relevant to the kernel bug, eg. if vv works and just gl does not)

2. please don't get that wrong (one never knows the background of a bug reporter ;-), but is the process actually stopped like in the posix signal (and eventually later on SIGKILL'd) or is that just what you assumed to be an appropriate description of what happened?
Comment 6 Detlev Casanova 2013-03-24 18:22:16 UTC
I should run multiple tests with mplayer when it has been emerged.

I think I wasn't clear: the process kwin terminates and returns the value 1, that's what I can see from GDB. Also, the KDE bug reporter does not show up.

Gdb says that 2 threads exited, then, says
[Inferior 1 (process 2923) exited with code 01]

And unless you tell me that I randomly hijack this bug report, i won't get anything wrong, we're are all here to try and help, right ? :)
Comment 7 Thomas Lübking 2013-03-24 20:08:55 UTC
Ok, that one is pretty common :-(

The intel batchbuffer exit(1)s for EIO ("the gpu hangs") - it would be nicer to have an abort() instead.

Random pondering:
----------------------
we could atexit() some exit handler and skip it for a global variable set when we exit() - eg. for a WM being present - or QCoreApplication::aboutToQuit()

Then use the exit handler to deactivate compositing and restart (or restart with crashcounter)

--
The sigchild assumption in the dupe is WRONG, i found a compiz bug where this is a sigchild (pot. from the decorator or whatever) but the mesa commit does not support that assumption the least:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=6862b54f4d4e88ef6ebf709ea7798093ec337e2a

*** This bug has been marked as a duplicate of bug 307348 ***