358646 – Rendering breaks if Triple Buffer detection gives NOT available

Bug 358646 - Rendering breaks if Triple Buffer detection gives NOT available

Summary: Rendering breaks if Triple Buffer detection gives NOT available

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	kwin
Classification:	Plasma
Component:	glx (show other bugs)
Version:	git master
Platform:	Other Linux

Importance:	NOR normal
Target Milestone:	---
Assignee:	KWin default assignee

URL:
Keywords:

Duplicates (2):	358963 359262 (view as bug list)
Depends on:
Blocks:

Reported:	2016-01-27 16:39 UTC by Martin Flöser
Modified:	2024-06-06 17:00 UTC (History)
CC List:	6 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:

Flags:	mgraesslin: Intel+

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Martin Flöser 2016-01-27 16:39:06 UTC

From Time to time my KWin (Intel Ivybridge) reports:
Triple buffering detection: "NOT available"  - Mean block time: 1.25074 ms

From that time on the rendering is completely broken and restarting the compositor fixes it. The next triple buffer detection might work again. E.g. the current session has:

KWin::SwapProfiler::end: Triple buffering detection: "Available"  - Mean block time: 0.986466 ms

From what I understand NOT available will call setBlocksForRetrace(true) and changes the behavior.

I do not experience this behavior on Intel Sandybridge. The behavior is more easily triggered when X has problems (plugging in my headset makes X input handling freeze) or when KWin QML is loaded (long block?).

I'm interested fixing this but am not familiar enough with the swapping code. Where to look at and how can I start KWin with going directly in the "broken" code path?

Comment 1 Thomas Lübking 2016-01-27 16:49:32 UTC

There's no fix - the detection is heuristic and can fail if swapping takes "too long" for external reasons.

If the code falsely believes you'r e on blocked swapping, the timing calculation is flawed (though please elaborate on "completely broken" - TB detection may be rather a sympton than the cause!)

As workaround export KWIN_TRIPLE_BUFFER=1

~/.config/plasma-workspace/env/kwin_env.sh
-------- snip --------
#!/bin/sh
export KWIN_TRIPLE_BUFFER=1
-------- /snip --------

chmod +x ~/.config/plasma-workspace/env/kwin_env.sh

This will bypass the detection and get you into the non-blocking codepath unconditionally.

Comment 2 Thomas Lübking 2016-01-27 16:55:54 UTC

Sh*** I didn't look at the reporter ;-)

IIrc you actually mentioned that before (mailing list?) but don't recall on the details.
Either way, it would be good to know whether this also occurs if you enforce the non-blocking path (and what "completely broken" means itr)

Could it be rather related to
export KWIN_USE_BUFFER_AGE=0
?

Comment 3 Thomas Lübking 2016-01-27 17:11:23 UTC

in addition,  export KWIN_TRIPLE_BUFFER=0 should kick you into the broken path.
If that triggers it, I assume the cause to be either
a) the reversed paint/sync/swap cycle which is required to measure the time we spend on creating the frame (::endRenderingFrame) or
b) the recently added glXWaitGL() call in ::present() ("at least the nvidia blob manages to swap async")

latter only in glxbackend.cpp, former also in eglonxbackend.cpp

Comment 4 Martin Flöser 2016-01-28 08:20:13 UTC

> Sh*** I didn't look at the reporter ;-)

lol

> export KWIN_USE_BUFFER_AGE=0

had tried that a few times in the past and didn't work.

> export KWIN_TRIPLE_BUFFER=0 should kick you into the broken path.

yep

> though please elaborate on "completely broken" 

if you're interested I can try to make a video.

My idea at the moment is that something messes up the heuristic, e.g. that frames take too long. Though the code should protect against it....

Comment 5 Martin Flöser 2016-01-28 09:19:23 UTC

I collected the data from 500 frames. It shows huge differences with a minimum of 0.25 msec and a maximum of 18 msec. About half of the frames it's below 1 msec about half is above (slightly more than below). The extremes going to above seem to kill the statistics. There are 7 frames which took longer than 10 msec. As soon as we hit such an extreme m_time switches to blocking for quite some time.

Overall it looks to me like the swap profiler doesn't generate useful data in the case of my system. So I rather look into fixing the rendering than trying to get a better m_time value.

Comment 6 Martin Flöser 2016-01-28 09:37:34 UTC

On EGL I also get "NOT available" but rendering doesn't break.

Comment 7 Martin Flöser 2016-01-28 10:22:35 UTC

Now I'm able to further specify what are the problematic areas. With triple buffer disabled I get rendering issues if in GlxBackend::present() the fullscreen path is taken, and it renders correctly if the m_haveMESACopySubBuffer is taken.

Comment 8 Thomas Lübking 2016-01-28 11:34:26 UTC

You did not happen to accidentally enable m_haveINTELSwapEvent ?

Otherwise (just a wild guess)

diff --git a/glxbackend.cpp b/glxbackend.cpp
index c59c647..da6905c 100644
--- a/glxbackend.cpp
+++ b/glxbackend.cpp
@@ -735,10 +735,10 @@ void GlxBackend::endRenderingFrame(const QRegion &renderedRegion, const QRegion
         // In this case we won't post the back buffer. Instead we'll just
         // set the buffer age to 1, so the repaired regions won't be
         // rendered again in the next frame.
-        if (!renderedRegion.isEmpty())
+//         if (!renderedRegion.isEmpty())
             glFlush();
 
-        m_bufferAge = 1;
+//         m_bufferAge = 1;
         return;
     }

Comment 9 Martin Flöser 2016-01-28 12:19:26 UTC

> You did not happen to accidentally enable m_haveINTELSwapEvent ?

nope

> Otherwise (just a wild guess)

no change

btw. I don't have buffer age (only listed as a client extension), though a change to query the client extensions didn't change anything.

Comment 10 Thomas Lübking 2016-01-28 17:47:52 UTC

so the swaps are caused by actual full scene repaints - a video of the problem would be good to get an idea of what the problem could be.

Comment 11 Martin Flöser 2016-01-29 09:45:50 UTC

video at https://share.kde.org/index.php/s/Ili1SSWhCftIK3S

I went for cube effect to show it. Also have a video with Present Windows but there it's difficult to recognize on the video.

Comment 12 Thomas Lübking 2016-01-29 10:52:50 UTC

diff --git a/glxbackend.cpp b/glxbackend.cpp
index c59c647..9a8e0e1 100644
--- a/glxbackend.cpp
+++ b/glxbackend.cpp
@@ -751,7 +751,7 @@ void GlxBackend::endRenderingFrame(const QRegion &renderedRegion, const QRegion
     } else {
         // Make sure that the GPU begins processing the command stream
         // now and not the next time prepareRenderingFrame() is called.
-        glFlush();
+        glXWaitGL();
     }
 
     if (overlayWindow()->window())  // show the window only after the first pass,


----
Just to be sure: it's a phone video because that was the convenient thing to do, not because it doesn't show up with recordmydesktop et al.?

Comment 13 Martin Flöser 2016-01-29 11:53:05 UTC

> Just to be sure: it's a phone video because that was the convenient thing to do, not because it doesn't show up with recordmydesktop et al.?

For convenience.

Comment 14 Martin Flöser 2016-01-29 11:55:36 UTC

And I'm sorry to say the patch doesn't help

Comment 15 Martin Klapetek 2016-02-02 14:48:40 UTC

KWIN_TRIPLE_BUFFER=1 makes the scary things go away on my i5-4308U/Haswell/Intel Iris graphics.

Comment 16 Thomas Lübking 2016-02-05 12:31:55 UTC

*** Bug 358963 has been marked as a duplicate of this bug. ***

Comment 17 Thomas Lübking 2016-02-06 09:59:20 UTC

Do you have GL_ARB_sync and GL_ARB_buffer_storage extensions?

This forces the broken path, but disables the ringbuffer:
KWIN_TRIPLE_BUFFER=0 KWIN_PERSISTENT_VBO=0 kwin_x11 --replace &

Comment 18 Martin Flöser 2016-02-08 08:42:49 UTC

(In reply to Thomas Lübking from comment #17)
> Do you have GL_ARB_sync and GL_ARB_buffer_storage extensions?

According to glxinfo: yes

> 
> This forces the broken path, but disables the ringbuffer:
> KWIN_TRIPLE_BUFFER=0 KWIN_PERSISTENT_VBO=0 kwin_x11 --replace &

nope, still broken

Comment 19 Martin Klapetek 2016-02-08 18:02:05 UTC

(In reply to Thomas Lübking from comment #17)
> Do you have GL_ARB_sync and GL_ARB_buffer_storage extensions?

Yes.
 
> This forces the broken path, but disables the ringbuffer:
> KWIN_TRIPLE_BUFFER=0 KWIN_PERSISTENT_VBO=0 kwin_x11 --replace &

Brings back visual glitches and rendering errors back for me.

Comment 20 Thomas Lübking 2016-02-11 21:07:44 UTC

*** Bug 359262 has been marked as a duplicate of this bug. ***

Comment 21 Martin Flöser 2016-02-26 08:07:19 UTC

just for the record: rebooted system, now using Intel driver version 2.99.917+git20160218-1, still same problem

Comment 22 Martin Flöser 2016-02-26 08:14:37 UTC

comment #21 was on DRI2/UXA

just changed to DRI3/SNA and problem is completely gone \o/

Will do further tests for that.

Comment 23 Martin Flöser 2016-02-26 08:19:33 UTC

DRI2/SNA: problem visible

Comment 24 Martin Flöser 2016-02-26 08:33:15 UTC

further tests:
* enabling Option "TearFree" did not improve the situation
* glamor was ignored, switched to SNA according to Xorg log

Comment 25 Justin Zobel 2021-03-10 00:32:24 UTC

Thank you for the bug report.

As this report hasn't seen any changes in 5 years or more, we ask if you can please confirm that the issue still persists.

If this bug is no longer persisting or relevant please change the status to resolved.