Bug 346275

Summary:	KWIN broken vsync and wrong fps.
Product:	[Plasma] kwin	Reporter:	Alexey Dyachenko <adotfive>
Component:	general	Assignee:	KWin default assignee <kwin-bugs-null>
Status:	RESOLVED FIXED
Severity:	normal	CC:	adotfive, jan, vmlinuz386, walmartshopper
Priority:	NOR	Flags:	thomas.luebking: NVIDIA+ thomas.luebking: ReviewRequest+
Version First Reported In:	5.2.95
Target Milestone:	---
Platform:	Arch Linux
OS:	Linux
URL:	https://git.reviewboard.kde.org/r/125659/
See Also:	https://bugs.kde.org/show_bug.cgi?id=344433
Latest Commit:	http://commits.kde.org/kwin/8bea96d7018d02dff9462326ca9456f48e9fe9fb	Version Fixed/Implemented In:	5.5
Sentry Crash Report:
Attachments:	kwin 5.2.95 support information dump Triple buffered vs double buffered Workaround for double buffered case

Description Alexey Dyachenko 2015-04-16 19:34:31 UTC

It literally seems like kwin does double vsync. When refresh rate is 60Hz it keeps at 60fps at first (and tearing is there), then jumps to 91-92fps.
When it runs at 50hz (misdetection, another bug), it will run at 47fps or at 70fps.

The interesting part is when I specify no vsync for kwin it goes steady at 60fps, but of course with tearing. 
Setting VBLankTime to something like 2ms makes it go at 61 fps, but with tearing.
I'm not really versed with OpenGL but it looks like it tries to swap buffers at different places at same time or somethind disrupts the timer.

I thought maybe NVIDIA driver is doing something nasty, so I tried following configuration:
[code]Section "Device"
        Identifier      "Default Nvidia Device"
        Option          "NoLogo" "True"
        Option "TripleBuffer" "False"
        Option "NoFlip" "True"
        Option "TwinView" "False"
EndSection
[/code]
but it wasn't of any help. Neither __GL_YIELD=usleep and KWIN_TRIPLE_BUFFER=1 help.
My card is GTX 670 and driver 349.16; things were fine with last stable 5.2.x (aside from never fixed "__GL_YEILD" bug).

Reproducible: Always

Steps to Reproduce:
1. Fire up OpenGL composited kwin
2. Enable vsync


Actual Results:  
FPS does not match refresh rate and vsync is broken.

Expected Results:  
60fps with working vsync.

Comment 1 Thomas Lübking 2015-04-16 21:40:55 UTC

a) Do not enfore a false "KWIN_TRIPLE_BUFFER=1". This is to skip heuristic detection.
b) if it tears, vsync is likely not enabled.
c) if you get 90FPS, vsync is _certainly_ not enabled.

d) why is NoFlip enabled?
-> That explains the tearing (whether waiting for the retrace or not, copying the buffer is not fast enough to happen during the retrace)
There's also a runtime setting in nvidia-settings, don't turn flipping off unless you've a *REALLY* good reason (performance killer)

e) __GL_YIELD=usleep is only required if triple buffering is NOT enabled

f) there's a reported bug that detecting triple buffering fails during login with KDE 5. suspending/resuming the compositor (shift+alt+f12) or restarting "kwin_x11 --replace &" usually "fixes" it.

Comment 2 Alexey Dyachenko 2015-04-16 22:02:26 UTC

a,d,e -- I tried alot of options in various combinations just to diagnose things and to see if it gets any better.
b -- indeed.
c -- the point I was trying to make is that somehow sleeping interval gets modified
with vsync enabled and misdetected 50Hz it will be fixed 47-50 or 70 fps with no fluctuations,
while with vsync enabled and 60Hz it will be fixed 90-91 fps. 
60Hz is correct one, my display and xrandr both say 60Hz, but this is another bug, probably.

Main clue here is that with disabled vsync my fps is capped at 60-61fps.
Another clue is that modifying VBlankTime to 2 gets me correct 60fps with vsync enabled in kwin, but tearing will still be present.

e,f -- I am aware of longstanding bug and kwin code involving __GL_YIELD.

My previous 5.2 configuration was default one, with empty xorg.conf and flipping enabled (by default).

Note that I also did not set any environment variables in 5.2 (and before): I waited till vsync gets disabled by faulty tripple buffering detection, then I simply flip GL3->GL2 or vice versa to get working vsync and smooth 60fps.

Comment 3 Thomas Lübking 2015-04-16 22:21:21 UTC

(In reply to Alexey Dyachenko from comment #2)

> Main clue here is that with disabled vsync my fps is capped at 60-61fps.

That's no clue - there's simply a timer hitting every 16ms to repaint.

> a,d,e -- I tried alot of options in various combinations just to diagnose
> things and to see if it gets any better.

Ok, please
a) return to a vanilla state w/o random silly config settings (that will expectably fail)
b) Choose and record the choice for triple buffering
c) login, wait a few minutes and then
d) dump and attach the output of "qdbus org.kde.KWin /KWin supportInformation"

> c -- the point I was trying to make is that somehow sleeping interval gets
> modified

You don't understand:
if vsync is enabled, the driver waits for the next retrace signal with every swap - you cannot possibly *increase* the framerate > the refreshrate of the screen.
If you could, that would indicate a driver bug for sure.

> Another clue is that modifying VBlankTime to 2
literally "2"? - that's as good as "0" will mean to constantly miss retraces (lag a frame). The default is 6144.

> My previous 5.2 configuration was default one, with empty xorg.conf and
> flipping enabled (by default).
> 
> Note that I also did not set any environment variables in 5.2 (and before):
> I waited till vsync gets disabled by faulty tripple buffering detection,
> then I simply flip GL3->GL2 or vice versa to get working vsync and smooth
> 60fps.

Errr... fyi:
No xorg settings means no triple buffering.
No environment (ie. no __GL_YIELD=usleep) would result in vsync being forcefully disabled because of correctly detected absence of triple buffering (and a CPU hungry driver performing busy waits)

Comment 4 Alexey Dyachenko 2015-04-16 22:45:42 UTC

(In reply to Thomas Lübking from comment #3)
> (In reply to Alexey Dyachenko from comment #2)
> 
> > Main clue here is that with disabled vsync my fps is capped at 60-61fps.
> 
> That's no clue - there's simply a timer hitting every 16ms to repaint.
> 
> > a,d,e -- I tried alot of options in various combinations just to diagnose
> > things and to see if it gets any better.
> 
> Ok, please
> a) return to a vanilla state w/o random silly config settings (that will
> expectably fail)
> b) Choose and record the choice for triple buffering
> c) login, wait a few minutes and then
> d) dump and attach the output of "qdbus org.kde.KWin /KWin
> supportInformation"

I will. If there is reasonable number of dependencies I will also try to bisect, probably will be much faster to find root of the evil.

> > c -- the point I was trying to make is that somehow sleeping interval gets
> > modified
> 
> You don't understand:
> if vsync is enabled, the driver waits for the next retrace signal with every
> swap - you cannot possibly *increase* the framerate > the refreshrate of the
> screen.
> If you could, that would indicate a driver bug for sure.

But if there is no vsync being actually done than I can draw with any fps till hardware limit, right?

> > Another clue is that modifying VBlankTime to 2
> literally "2"? - that's as good as "0" will mean to constantly miss retraces
> (lag a frame). The default is 6144.

Value in config is multiplied by 1000 so thats 2000.

> > My previous 5.2 configuration was default one, with empty xorg.conf and
> > flipping enabled (by default).
> > 
> > Note that I also did not set any environment variables in 5.2 (and before):
> > I waited till vsync gets disabled by faulty tripple buffering detection,
> > then I simply flip GL3->GL2 or vice versa to get working vsync and smooth
> > 60fps.
> 
> Errr... fyi:
> No xorg settings means no triple buffering.
> No environment (ie. no __GL_YIELD=usleep) would result in vsync being
> forcefully disabled because of correctly detected absence of triple
> buffering (and a CPU hungry driver performing busy waits)

Plain simple double buffered vsync should work, no?

Well, there is a piece of code inside kwin, that would disable vsync forcefully if it sees no __GL_YIELD being set, and it actually was triggered every time I log in.
So it was setting either __GL_YIELD or KWIN_TRIPPLE_BUFFER to live peacefully, or get disabled vsync after 10 seconds and re-enable it manually.
Yes, the situation was quite ridiculous, to live through 4.x like that and to see same behaviour in 5.x but, what can I do? 
Well I could simply rip out that piece of code but thats is still spending some effort every version change.
Apart from that I had no reportred increase in CPU usage and vsync worked after mentioned above workarounds.

But all of that doesn't apply since 5.2.95.

Comment 5 Thomas Lübking 2015-04-17 07:59:27 UTC

(In reply to Alexey Dyachenko from comment #4)

> I will. If there is reasonable number of dependencies I will also try to
> bisect

later, first let's get an idea of the state. The problem may have been caused by libGL or nvidia driver update.

> But if there is no vsync being actually done than I can draw with any fps
> till hardware limit, right?
Yes. No.
There's still a timer, but it will run too fast if it would assume a locking buffer swap.

> Value in config is multiplied by 1000 so thats 2000.
Yes. As the comment suggests: 2000 NANO seconds. ie. "Nothing".

> Plain simple double buffered vsync should work, no?
Yes, but is vastly expensive due to the busy waiting driver.
Except vsync wasn't running by your changes OR gl yields by a usleep. (that's why it's required)
 
> So it was setting either __GL_YIELD
good.

> or KWIN_TRIPPLE_BUFFER to live
BAD!
However "KWIN_TRIPPLE_BUFFER" [sic!] would do nothing anyway.


The problem w/ __GL_YIELD is that the workload of kwin is linked in from libkdeinit5_* and that broke linking in "libnvidiahack" to set environments before the GL lib is linked.
If you manage to set the environment from the process (and in time) - any patch would be more than welcome!

Comment 6 Alexey Dyachenko 2015-04-17 09:30:02 UTC

Created attachment 92100 [details]
kwin 5.2.95 support information dump

Comment 7 Alexey Dyachenko 2015-04-17 09:41:10 UTC

(In reply to Thomas Lübking from comment #5)
> (In reply to Alexey Dyachenko from comment #4)
> > or KWIN_TRIPPLE_BUFFER to live
> BAD!
> However "KWIN_TRIPPLE_BUFFER" [sic!] would do nothing anyway.

I can ensure you that actual variables were set without typos (=

https://bugs.kde.org/attachment.cgi?id=92100
I added requested support information.
xorg.conf.d/20-nvidia.conf has only NoLogo
.nvidia-setting-rc file was removed ("Sync to VBlank" and "Allow Flipping" are both enabled, I checked this)
Dump was taken immediately after login, it shows about 55 fps (tearing is present).

Then fps jump to 99 and following changes can be seen:

glPreferBufferSwap: 99
Painting blocks for vertical retrace:  no

changes to

glPreferBufferSwap: 0
Painting blocks for vertical retrace:  yes

then I restart compositing or kwin and observe 60 fps (tearing is present) and variable changes to

Painting blocks for vertical retrace:  no

Following step was  putting "export __GL_YIELD=usleep" into .xprofile and repeating the procedure. Results were same.

Comment 8 Alexey Dyachenko 2015-04-17 09:52:58 UTC

I actually only just noticed that kwin_x11 sits at 10% of one core, something that was never observed. When I move windows it jumps to 25%. __GL_YIELD is set. Restarting kwin or compositing changes nothing.

Comment 9 Alexey Dyachenko 2015-04-17 10:10:48 UTC

Situation is exactly same with 346.59 from Arch repos.
I also tried to simply get back to 5.2.2 to obtain some info, but it crashes without full KDE downgrade.

Comment 10 Thomas Lübking 2015-04-17 13:40:59 UTC

(In reply to Alexey Dyachenko from comment #7)

> Following step was  putting "export __GL_YIELD=usleep" into .xprofile and
> repeating the procedure. Results were same.

Sure .xprofile is interpreted (in time)?

what's the output of
   tr '\0' '\n' < /proc/`pidof kwin_x11`/environ

Comment 11 Alexey Dyachenko 2015-04-17 13:54:34 UTC

(In reply to Thomas Lübking from comment #10)
> (In reply to Alexey Dyachenko from comment #7)
> 
> > Following step was  putting "export __GL_YIELD=usleep" into .xprofile and
> > repeating the procedure. Results were same.
> 
> Sure .xprofile is interpreted (in time)?
> 
> what's the output of
>    tr '\0' '\n' < /proc/`pidof kwin_x11`/environ

Yep, __GL_YIELD is there.
And I found out that vsync doesn't work only in double buffered case.
If I run without variable (I don't like it being set globaly) and use "Option "TripleBuffering" "True"", then vsync works if I re-enable it after faulty triple buffering detection.

I don't need triple buffering being forced on all GL programs, so I need to find out why double buffered case, which worked up till 5.2.2, stopped working in 5.2.95.

And btw I experience no weird CPU usage, my previous message was caused by kwin fps plugin.

Comment 12 Thomas Lübking 2015-04-17 14:15:07 UTC

If you "kwin_x11 --replace &" from konsole (w/ double buffering) do receive a warning about no triple buffering nor usleep yielding (and thus vsync deactivation)?

Comment 13 Alexey Dyachenko 2015-04-17 14:36:42 UTC

(In reply to Thomas Lübking from comment #12)
> If you "kwin_x11 --replace &" from konsole (w/ double buffering) do receive
> a warning about no triple buffering nor usleep yielding (and thus vsync
> deactivation)?

Yes I get:
kwin_core: Triple buffering detection: "NOT available"  - Mean block time: 8.01076 ms
as expected, but then I do " __GL_YIELD=usleep kwin_x11 --replace" and still get the same message.

Comment 14 Thomas Lübking 2015-04-17 14:41:21 UTC

(In reply to Alexey Dyachenko from comment #13)
> (In reply to Thomas Lübking from comment #12)
> > If you "kwin_x11 --replace &" from konsole (w/ double buffering) do receive
> > a warning about no triple buffering nor usleep yielding (and thus vsync
> > deactivation)?
> 
> Yes I get:
> kwin_core: Triple buffering detection: "NOT available"  - Mean block time:
> 8.01076 ms
> as expected, but then I do " __GL_YIELD=usleep kwin_x11 --replace" and still
> get the same message.

Yes, because you don't have it enabled. The crucial message would be

It seems you are using the nvidia driver without triple buffering\n"
"You must export __GL_YIELD=\"USLEEP\" to prevent large CPU overhead on synced swaps\n"
"Preferably, enable the TripleBuffer Option in the xorg.conf Device\n"
"For this reason, the tearing prevention has been disabled.\n"
"See https://bugs.kde.org/show_bug.cgi?id=322060\n";

__GL_YIELD=usleep
__GL_YIELD=USLEEP - "Size does matter!"™

Comment 15 Alexey Dyachenko 2015-04-17 14:55:24 UTC

Created attachment 92104 [details]
Triple buffered vs double buffered

Comment 16 Alexey Dyachenko 2015-04-17 15:00:08 UTC

Alright it does have some effect!
I guess there is no tearing, but its very laggy and feels like 20 fps.
https://bugs.kde.org/attachment.cgi?id=92104
Please check this attachment with kwin fps plugin for both triple buffered and double buffered (with USLEEP) cases.

In 5.2.2 and prior I did not use __GL_YIELD, and was reactivating vsync every login, and it was very smooth, like now with triple buffering. Now triple buffered is only possible use scenario.

Comment 17 walmartshopper 2015-05-04 18:10:30 UTC

I started having this problem when I upgraded from 5.2 to 5.3 on Arch.  Here's what I've seen:

I have four monitors, 3 at 60hz and 1 at 110hz.  With kwin 5.2, after logging in, I would switch vsync to Never then back to Automatic.  After that, it was buttery smooth on the 110hz monitor.  After upgrading to 5.3, I noticed it looked choppy (60fps looks choppy after getting used to 110).  When kwin first starts with vsync enabled, it's running at 48fps (half of 96hz).  Then after a few seconds, it jumps to 72fps (half of 144hz).  If I disable vsync, it runs at 60fps.  I have tried playing with triple buffering and __GL_YIELD, but I can't get it to run above 72fps.  It's almost like kwin has some standard refresh rates hard-coded.  But there are overclockable monitors that could be running at any refresh rate via a custom edid file (that's why mine is 110hz).  So I don't know what changed, but in 5.2 I got 110fps, and with 5.3 I can't get more than 72fps, and it's a very noticeable difference.

Comment 18 Alexey Dyachenko 2015-05-04 19:43:20 UTC

(In reply to walmartshopper from comment #17)
> I started having this problem when I upgraded from 5.2 to 5.3 on Arch. 
> Here's what I've seen:
> 
> I have four monitors, 3 at 60hz and 1 at 110hz.  With kwin 5.2, after
> logging in, I would switch vsync to Never then back to Automatic.  After
> that, it was buttery smooth on the 110hz monitor.  After upgrading to 5.3, I
> noticed it looked choppy (60fps looks choppy after getting used to 110). 
> When kwin first starts with vsync enabled, it's running at 48fps (half of
> 96hz).  Then after a few seconds, it jumps to 72fps (half of 144hz).  If I
> disable vsync, it runs at 60fps.  I have tried playing with triple buffering
> and __GL_YIELD, but I can't get it to run above 72fps.  It's almost like
> kwin has some standard refresh rates hard-coded.  But there are
> overclockable monitors that could be running at any refresh rate via a
> custom edid file (that's why mine is 110hz).  So I don't know what changed,
> but in 5.2 I got 110fps, and with 5.3 I can't get more than 72fps, and it's
> a very noticeable difference.

Can you post KWin support information (see posts above)?
Also check kwin output; I have two 60Hz monitors but kwin detects my refresh rate as 50Hz, so I have to set

RefreshRate=60
in ~/.config/kwinrc
This is another bug introduced in 5.3 and hopefully its reported elsewhere.

Comment 19 Alexey Dyachenko 2015-05-04 19:48:11 UTC

I have replaced 670 with 980 and double buffering works as pre 5.3.
I dont export __GL_YIELD and have triple buffering disabled, all I have to do is usual voodoo dance with re-enabling vsync.

Actually can we have checkbox for double buffered vsync? e.g. not checking for tripple buffering and simply doing swaps without blocking retraces (as I inderstand it is doing now), looking as I dont need to export __GL_YIELD.

Comment 20 walmartshopper 2015-05-04 20:47:03 UTC

Thanks, updating my kwinrc fixed it.  I added RefreshRate=110 and MaxFPS=120, and now it's running smoothly again.

Comment 21 Thomas Lübking 2015-05-04 22:43:12 UTC

> I have replaced 670 with 980 and double buffering works as pre 5.3.
This has really no impact from our side (ie. you've a faster GPU now, that's all - both should however be overdimensioned for KWin needs)

Without triple buffering, synced swaps will _always_ block.
What you may want is "ignore that the driver performs a busy wait", but that's actually silly: just export __GL_YIELD=USLEEP

I linked in another bug which may be related, there's sth. randomly causing slow fence syncing (w/ triple buffering, though)


The refresh rate detection is a bit odd.
The 50Hz "issue" was originally a design defect in the nvidia driver, due to their TwinView implementation. We worked around that by avoiding xrandr and calling into xf86vm (xvidmode) or even asking nvidia-settings as last resort.

Then nvidia introduced full xrandr 1.3 support and "xrandr -q" actually still reports proper refresh rates here.
So after a long time we removed the workarounds.
And no lately at least the xcb connections return that 50Hz again ... :-(

Comment 22 Alexey Dyachenko 2015-05-05 19:44:20 UTC

(In reply to Thomas Lübking from comment #21)
> that's actually silly: just export __GL_YIELD=USLEEP

I would be happy to, but if I do that everything is laggy and kwin timings become sawtoothy, like in attachment https://bugs.kde.org/attachment.cgi?id=92104 (left is just triple buffering enabled, right is only __GL_YIELD="USLEEP"), as I said before.

Comment 23 Alexey Dyachenko 2015-05-05 19:44:38 UTC

(In reply to Thomas Lübking from comment #21)
> that's actually silly: just export __GL_YIELD=USLEEP

I would be happy to, but if I do that everything is laggy and kwin timings become sawtoothy, like in attachment https://bugs.kde.org/attachment.cgi?id=92104 (left is just triple buffering enabled, right is only __GL_YIELD="USLEEP"), as I said before.

Comment 24 Thomas Lübking 2015-05-05 21:16:45 UTC

Yes, triple buffering is supposed to produce better results, but I thought it does not for you, thus you *want* to use double buffering?

For a detailed explanation of the behavior:
----------------------------------------------------------
a) If triple buffering is enabled in the driver AND (not! "or") this is correctly detected or enforced, KWin will swap buffers every ~16ms or as fast as possible (if painting takes longer) and rely on the driver to move the 3rd buffer to scanout in sync to the screen.

b) If triple buffering is disabled AND this is correctly detected or enforced AND __GL_YIELD=USLEEP is exported, KWin will swap buffers faster (10ms) to be early for the retrace and rely on the buffer swap to block until the retrace is finished.

c) If triple buffering is assumed to be not available (correct or not) AND __GL_YIELD=USLEEP is NOT exported, vsync will be disabled. Whether for this or explicit configuration, the behavior is exactly the same as for triple buffering, just that you'll get tearing.

d) If triple buffering is enabled, but KWin falsely assumes "permitted" (__GL_YIELD=USLEEP) double buffering, KWin will flood the buffer, ie. swap ~ every 10ms and the driver will block the third swap for a frame.

e) If triple buffering is disabled, but KWin falsely assumes it enabled, it will swap too slow. It'll be too late for almost every frame and then spend nearly the entire next frame waiting for the next retrace. The FPS will drop to 30Hz.

Your lasted comment would fit the (d) condition.

Comment 25 Alexey Dyachenko 2015-05-05 21:27:51 UTC

There is currently 3 scenarious I encounter:
1) I set "TripleBuffer" "True" in corg.conf, no __GL_YIELD or nothing. VSYNC works out of the box. Satisfying, but triple buffering is undesirable for video playback or maybe games.
2) Empty xorg.conf, no __GL_YIELD. I have vsync but it gets disabled, I flip backend OGL3->OGL2 and viceversa and VSYNC is back. Timings in fps plugin are same as in (1), e.g. everything is smooth. (with 670 on 5.2 and before, and 980 on 5.3, currently) It is best scenario except having to manualy re-enable vsync.

3) Empty xorg.conf, __GL_YIELD="USLEEP", I get lag and sawtooth timings.

With what you said and what is happening I don't see any use in setting __GL_YIELD="USLEEP" as kwin correctly detects lack of triple buffering and double buffered vsync simply works after re-enabling.

Then why kwin assumes hacky behaviour as default and makes user suffer?

Comment 26 Alexey Dyachenko 2015-05-07 09:38:27 UTC

Created attachment 92474 [details]
Workaround for double buffered case

Comment 27 Alexey Dyachenko 2015-05-07 09:39:11 UTC

(In reply to Alexey Dyachenko from comment #26)
> Created attachment 92474 [details]
> Workaround for double buffered case

Re-enabling vsync on each login is getting on my nerves so I decided to hack around the problem.

As one can see from the support information above, re-enabling vsync after kwin automatically disables it in case of double buffering sets BlocksForRetrace to false.
So instead of disabling vsync to re-enable it manually later I made that if block simply set BlocksForRetrace to false. Now vsync works OOB in double buffered case like in any other compositor ^ ^

Check attachment for details.

Comment 28 Thomas Lübking 2015-05-07 10:01:11 UTC

That patch makes *zero* sense.

You claim triple buffering to be disabled (thus swapping does actually block)
In total, you're enforcing case (e)

=> Unless you artificially boost the refreshrate, KWin will systematically swap too slow/late and then has to wait for an entire frame.
You'll end up w/ sth. between 30-60Hz (what is exactly what I get here for such change)

You'll find the buffer swap a line UP from the patch position, add a

   m_swapProfiler.begin();
}
+ QElapsedTimer profile;
+ static quint64 swaptime = 0;
+ static uint swapcounter = 0;
+ profile.start();
glXSwapBuffers(display(), glxWindow);
+ swaptime += profile.nsecsElapsed();
+ if (++swapcounter == 100) {
+    qDebug() << "average swap time" << swaptime / 100000000.0 << "ms";
+    swaptime = swapcounter = 0;
+ }
if (gs_tripleBufferNeedsDetection) {

To print out how much time you spend waiting for the vertical retrace.
If that number is very small, your swap don't block, ie. you _are_ triple buffering.
If it's very large (> 1ms), you're loosing frames for pretty much sure.

Please deactivate the FPS counter plugin and actually don't care too much about its output.
It pollutes what it measures and is rather unmaintained.

Comment 29 Alexey Dyachenko 2015-05-07 10:24:44 UTC

(In reply to Thomas Lübking from comment #28)
> That patch makes *zero* sense.

I understand what I'm doing very little, however patch works for me. 

KWIN says TB is not present and I should export __GL_YIELD. As I do that (your case (b)) I get massive lag, so something is very wrong somewhere.

> You claim triple buffering to be disabled (thus swapping does actually block)
> In total, you're enforcing case (e)
It is indeed disabled. NVIDIA docs say its off by default, but just in case I have it disabled, here is my /etc/X11/xorg.conf.d/20-nvidia.conf

Section "Monitor"
    Identifier             "HTHG100233"
    DisplaySize            600 340    # In millimeters
EndSection

Section "Device"
        Identifier      "Default Nvidia Device"
        Option          "NoLogo" "True"
        Option          "TripleBuffer" "False"
        Option          "UseEdidDpi" "False"
        Option          "DPI" "108 x 108"
        Option          "Monitor-DP-4" "HTHG100233"
EndSection

and KWIN says 

kwin_core: Triple buffering detection: "NOT available"  - Mean block time: 8.47859 ms

> => Unless you artificially boost the refreshrate, KWin will systematically
> swap too slow/late and then has to wait for an entire frame.
> You'll end up w/ sth. between 30-60Hz (what is exactly what I get here for
> such change)

I have RefreshRate=60 in kwinrc

kwin_core: Vertical Refresh rate  60 Hz

> You'll find the buffer swap a line UP from the patch position, add a
> 
>    m_swapProfiler.begin();
> }
> + QElapsedTimer profile;
> + static quint64 swaptime = 0;
> + static uint swapcounter = 0;
> + profile.start();
> glXSwapBuffers(display(), glxWindow);
> + swaptime += profile.nsecsElapsed();
> + if (++swapcounter == 100) {
> +    qDebug() << "average swap time" << swaptime / 100000000.0 << "ms";
> +    swaptime = swapcounter = 0;
> + }
> if (gs_tripleBufferNeedsDetection) {
> 
> To print out how much time you spend waiting for the vertical retrace.
> If that number is very small, your swap don't block, ie. you _are_ triple
> buffering.
> If it's very large (> 1ms), you're loosing frames for pretty much sure.
> 
> Please deactivate the FPS counter plugin and actually don't care too much
> about its output.
> It pollutes what it measures and is rather unmaintained.

Here is the output:

average swap time 0.0362707 ms
average swap time 0.0408241 ms
average swap time 0.0406007 ms
average swap time 0.0330792 ms
average swap time 0.0294599 ms
kwin_core: Triple buffering detection: "NOT available"  - Mean block time: 8.47859 ms
average swap time 0.0622002 ms
average swap time 0.0973264 ms
average swap time 0.0755968 ms
average swap time 0.0808665 ms
average swap time 0.0717898 ms

Comment 30 Alexey Dyachenko 2015-05-07 10:26:54 UTC

Just to be clear, the output above is with my patch, you can actually see the moment BlocksForRetrace get set to false.

Comment 31 Alexey Dyachenko 2015-05-07 10:48:58 UTC

Here is vanilla 5.3.0 without triple buffering WITH __GL_YIELD=USLEEP

average swap time 0.235458 ms
average swap time 0.23607 ms
average swap time 0.237488 ms
average swap time 0.24853 ms
average swap time 0.261053 ms
kwin_core: Triple buffering detection: "NOT available"  - Mean block time: 6.06375 ms
average swap time 4.94924 ms
average swap time 4.88566 ms
average swap time 4.93876 ms
average swap time 5.32374 ms
average swap time 5.02611 ms

Comment 32 Alexey Dyachenko 2015-10-08 10:38:06 UTC

I got second nvidia system (with 960), and its exhibiting exactly same beaviour with a fresh install of 5.4.1 and nvidia 355.11, no custom configs or any options set.

Setting BlocksForRetrace to false heals vsync when no tripple buffering available.

I have no other nvidia system to test this, but I'm gonna make a guess that something is buggy in kwin, a regression or it never worked properly at all with a blob.

Comment 33 Thomas Lübking 2015-10-08 10:57:27 UTC

whatever the problem is: it *can* not be worked around by your patch, the resolution will be some side effect.

if you want to try, try scratching the usleep requirement (so you can have enabled swapcontrol on double buffering w/o usleeping)

and if you can, ping me a reminder at ~20:00 utc to disable triple buffering here ;-)

Comment 34 Alexey Dyachenko 2015-10-08 11:04:18 UTC

I never meant my patch to be a proper workaround, it only leads to same result, as if user hits OpenGL3->2->Apply->3->Apply, as a matter of fact I'm too lazy to rebuild package everytime it changes, so I'm still doing it by hand on every boot.

> if you want to try, try scratching the usleep requirement (so you can have enabled swapcontrol on double buffering w/o usleeping)

I'm not sure what you mean, I totally don't know what happens inside KWIN as I don't have time to study it properly. Are you talking about __GL_YEILD here? If so, then I'm not. The whole point here is that __GL_YEILD=USLEEP causes massive lag (see comment #31), but supposed to be official workaround.

Comment 35 Alexey Dyachenko 2015-10-08 11:15:52 UTC

Sorry for the double posting.
Just to bring some clarity to the topic, this is what vanilla KWIN does itself when vsync gets broken and then I change backend OpenGL versions:

$ qdbus org.kde.KWin /KWin supportInformation | grep block
Painting blocks for vertical retrace:  yes
(change backend version and back)
$ qdbus org.kde.KWin /KWin supportInformation | grep block
Painting blocks for vertical retrace:  no

If this is relevant, I use 'Re-use screen content'.

Comment 36 Thomas Lübking 2015-10-08 13:12:10 UTC

This patch:

diff --git a/eglonxbackend.cpp b/eglonxbackend.cpp
index 314bfb2..7f68424 100644
--- a/eglonxbackend.cpp
+++ b/eglonxbackend.cpp
@@ -344,7 +344,7 @@ void EglOnXBackend::present()
                 gs_tripleBufferUndetected = gs_tripleBufferNeedsDetection = false;
                 if (result == 'd' && GLPlatform::instance()->driver() == Driver_NVidia) {
                     // TODO this is a workaround, we should get __GL_YIELD set before libGL checks it
-                    if (qstrcmp(qgetenv("__GL_YIELD"), "USLEEP")) {
+                    if (false && qstrcmp(qgetenv("__GL_YIELD"), "USLEEP")) {
                         options->setGlPreferBufferSwap(0);
                         eglSwapInterval(eglDisplay(), 0);
                         qCWarning(KWIN_CORE) << "\nIt seems you are using the nvidia driver without triple buffering\n"
diff --git a/glxbackend.cpp b/glxbackend.cpp
index 0abb1e3..acbd64a 100644
--- a/glxbackend.cpp
+++ b/glxbackend.cpp
@@ -635,7 +635,7 @@ void GlxBackend::present()
                     gs_tripleBufferUndetected = gs_tripleBufferNeedsDetection = false;
                     if (result == 'd' && GLPlatform::instance()->driver() == Driver_NVidia) {
                         // TODO this is a workaround, we should get __GL_YIELD set before libGL checks it
-                        if (qstrcmp(qgetenv("__GL_YIELD"), "USLEEP")) {
+                        if (false && qstrcmp(qgetenv("__GL_YIELD"), "USLEEP")) {
                             options->setGlPreferBufferSwap(0);
                             setSwapInterval(0);
                             qCWarning(KWIN_CORE) << "\nIt seems you are using the nvidia driver without triple buffering\n"


---------

What I meant is that the direct impact of your change can hardly improve things; thus I assume an indirect effect will do.

Comment 37 Thomas Lübking 2015-10-08 19:41:19 UTC

Hmmm - no problems with double buffered compositing and __GL_YIELD=USLEEP here.
Can you elaborate on your testcase/scenario (cpu load, running glxgears, moving around windows, .... etc.)
Once again and to be sure: hands off from the fps counter effect for this.

Comment 38 Alexey Dyachenko 2015-10-09 09:01:17 UTC

Alright I have very good news.
First, I am unable to reproduce comment #31 anymore.
Second, I falsely assumed BlocksForRetrace=false to be the cure, instead, as you suggested removing whole if-block (if you trace back to my original patch it also did this very thing), so that swaps are not disabled and that is enough, at least for my machines.

if (result == 'd' && GLPlatform::instance()->driver() == Driver_NVidia)

Resulting kwin cpu usage is 0-2%

You insist BlocksForRetrace=false to be improper in such case, but changing backend OpenGL versions changes this variable from YES (starting value) to NO (comment #35), so you probably should look into that behaviour.

Comment 39 Thomas Lübking 2015-10-09 09:07:08 UTC

no, actually nothing's good in double buffering.

adter figuring that usleep was actually not ultimately exported to kwin and fixing that and fixing the redetection after suspend/resume, dragging windows feels like working in jelly.

the cause seems to be that vey often waitTime in setCompositeTimer ends up being very small, so we basically trigger a swap every other eventcycle (which will then block quite some time)
also we seem to skew against the vblank rate, what causes regular frame skips (other bugs reported)

i assume the cause is that the last swap time is included into the frame rendering time. we'll see whether that causes the skew as well.

of course blocking purely randomly (~30%) too long is better than blocking too long systematically

i'll have a closer look and patches tonight.

Comment 40 Thomas Lübking 2015-10-09 16:41:24 UTC

LOL.

This patch should
a) redetect double/triple buffering on suspend resume cycles
b) properly hint whether painting currently blocks (says "no" if swap control is turned off)
c) improve double buffered rendering "snappiness" (I think we still skew, gonna take another look)

The problem is that nvidia seems to manage to not block double buffer swapping (and we then flood the queue)

Ultimately this makes it "safe" for nvidia users to enforce

export KWIN_TRIPLE_BUFFER=1

but that's oc. non-reliable behavior. But good to know.

-----------


diff --git a/glxbackend.cpp b/glxbackend.cpp
index 0abb1e3..c767eef 100644
--- a/glxbackend.cpp
+++ b/glxbackend.cpp
@@ -119,6 +119,9 @@ GlxBackend::GlxBackend()
     init();
 }
 
+static bool gs_tripleBufferUndetected = true;
+static bool gs_tripleBufferNeedsDetection = false;
+
 GlxBackend::~GlxBackend()
 {
     if (isFailed()) {
@@ -129,6 +132,9 @@ GlxBackend::~GlxBackend()
     cleanupGL();
     doneCurrent();
 
+    gs_tripleBufferUndetected = true;
+    gs_tripleBufferNeedsDetection = false;
+
     if (ctx)
         glXDestroyContext(display(), ctx);
 
@@ -142,9 +148,6 @@ GlxBackend::~GlxBackend()
     delete m_overlayWindow;
 }
 
-static bool gs_tripleBufferUndetected = true;
-static bool gs_tripleBufferNeedsDetection = false;
-
 void GlxBackend::init()
 {
     initGLX();
@@ -629,8 +632,8 @@ void GlxBackend::present()
                 m_swapProfiler.begin();
             }
             glXSwapBuffers(display(), glxWindow);
+            glXWaitGL();
             if (gs_tripleBufferNeedsDetection) {
-                glXWaitGL();
                 if (char result = m_swapProfiler.end()) {
                     gs_tripleBufferUndetected = gs_tripleBufferNeedsDetection = false;
                     if (result == 'd' && GLPlatform::instance()->driver() == Driver_NVidia) {
@@ -638,6 +641,8 @@ void GlxBackend::present()
                         if (qstrcmp(qgetenv("__GL_YIELD"), "USLEEP")) {
                             options->setGlPreferBufferSwap(0);
                             setSwapInterval(0);
+                            // hint proper behavior
+                            result = 0;
                             qCWarning(KWIN_CORE) << "\nIt seems you are using the nvidia driver without triple buffering\n"
                                               "You must export __GL_YIELD=\"USLEEP\" to prevent large CPU overhead on synced swaps\n"
                                               "Preferably, enable the TripleBuffer Option in the xorg.conf Device\n"

Comment 41 Thomas Lübking 2015-10-16 14:11:03 UTC

Could you test the latest patch?

Comment 42 Alexey Dyachenko 2015-10-16 14:39:17 UTC

(In reply to Thomas Lübking from comment #41)
> Could you test the latest patch?

Yes, I started testing it immediately.

As I understand KWIN_TRIPLE_BUFFER=1 makes it bypass that if(nvidia) block, so no more hacks necessary to run double buffered out of the box. Anyway, I've been running with variable exported and it works good.

Still, my humble opinion, considering that TripleBuffering with nvidia is non-default driver behavior and requires user to edit xorg config, it is currently one too many workarounds to do to have vsync working with nvidia out of the box.

Since TripleBuffering is non-default, as KWIN is able to properly to detect buffering type used, couldn't KWIN behave as if double buffering is the default (as far as my hardware goes, 670, 980, 960, __GL_YIELD is optional)?

Comment 43 Thomas Lübking 2015-10-16 22:25:11 UTC

How does the patched version behave for you (with *only* __GL_YIELD=USLEEP exported)?



> As I understand KWIN_TRIPLE_BUFFER=1 makes it bypass that if(nvidia) block, so no more 
> hacks necessary to run double buffered out of the box.
You should not have to export KWIN_TRIPLE_BUFFER and you should (usually, we're in a special condition atm) BY NO MEANS set it to one if you're actually double buffering.

This *only* worked because nvidia didn't block on glxSwapBuffers, but with the patch it *will* block on glxWaitGl

> Since TripleBuffering is non-default, as KWIN is able to properly to detect buffering type used, 
> couldn't KWIN behave as if double buffering is the default

This makes no sense.
Aside that there's actually no reliable way to detect the buffer count, if KWin can detect it, the default does absolutely not matter (but for the initial 500 frames during the detectin phase, which is neglectable)

Comment 44 Alexey Dyachenko 2015-10-18 16:04:43 UTC

Okay, here are testing results:
-- _GL_YIELD=USLEEP works as good as KWIN_TRIPLE_BUFFER=1, seriously.
-- With no __GL_YIELD set and no TripleBuffer enabled it is no longer possible to re-enable vsync as before (by switching GL versons). As I see this is intended by patch.

Now, to have vsync in the latter case I commented out

//                            options->setGlPreferBufferSwap(0);
//                            setSwapInterval(0);

Comment 45 Thomas Lübking 2015-10-31 10:33:23 UTC

(In reply to Alexey Dyachenko from comment #44)
> no longer
> possible to re-enable vsync as before (by switching GL versons). As I see
> this is intended by patch.

Yes, this is intended - former behavior was clearly a bug.

> Now, to have vsync in the latter case I commented out
> 
> //                            options->setGlPreferBufferSwap(0);
> //                            setSwapInterval(0);

The altered patch allows you to enforce KWIN_TRIPLE_BUFFER=1 on double buffering (as *atm*. the nvidia blob doesn't seem to block on swapping, what's the relevant aspect for kwin)

Comment 46 Thomas Lübking 2015-11-11 21:36:10 UTC

Git commit 8bea96d7018d02dff9462326ca9456f48e9fe9fb by Thomas Lübking.
Committed on 11/11/2015 at 21:18.
Pushed by luebking into branch 'master'.

wait for GL after swapping

otherwise at least on the nvidia blob the swapping
doesn't block even for double buffering

REVIEW: 125659
Related: bug 351700
FIXED-IN: 5.5

M  +4    -0    glxbackend.cpp

http://commits.kde.org/kwin/8bea96d7018d02dff9462326ca9456f48e9fe9fb