307965 – Upper part of windows tears when moving it left/right ONLY in upper part of display

Bug 307965 - Upper part of windows tears when moving it left/right ONLY in upper part of display

Summary: Upper part of windows tears when moving it left/right ONLY in upper part of d...

Status:	CLOSED FIXED

Alias:	None

Product:	kwin
Classification:	Plasma
Component:	scene-opengl (show other bugs)
Version:	4.9.2
Platform:	Ubuntu Linux

Importance:	NOR normal
Target Milestone:	4.11
Assignee:	KWin default assignee

URL:	http://www.kubuntuforums.net/showthre...
Keywords:

Duplicates (1):	308332 (view as bug list)
Depends on:
Blocks:

Reported:	2012-10-06 09:49 UTC by John van Spaandonk
Modified:	2013-10-19 13:56 UTC (History)
CC List:	24 users (show)

See Also:
Latest Commit:	http://commits.kde.org/kde-workspace/6072b4feb8c90024aa24b2e9cb8a21ab2140412c
Version Fixed In:	4.11
Sentry Crash Report:

Attachments
my xorg log file (38.71 KB, text/plain) 2012-10-06 09:49 UTC, John van Spaandonk	Details
Christmas time (2.50 KB, patch) 2012-10-15 20:55 UTC, Thomas Lübking	Details
Hanukkah (9.38 KB, patch) 2012-10-17 20:26 UTC, Thomas Lübking	Details
Hanukkah #2 (9.42 KB, patch) 2012-10-18 03:48 UTC, Thomas Lübking	Details
low-copy buffer swap (6.35 KB, patch) 2012-10-28 11:52 UTC, Ralf Jung	Details
Show Obsolete (2) View All Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description John van Spaandonk 2012-10-06 09:49:06 UTC

Problem occurs with Kubuntu 12.04 and got worse with kubuntu 12.10 beta 2.
Default settings (using an usb image to try out Kubuntu 12.10)

Using Intel sandy bridge graphics 2500k processor.
(Problem does not occur on my dell Studio XPS laptop with intel i965 graphics)
When I move a window horizontally the top part is displaced 2-8 mm depending upon the speed of moving.
This problem only occurs with the part of the window that is above a certain height on the screen
Kubuntu 12.04: problem occurs only in top of display (about the height of a window title bar)
Kubuntu 12.10: problem occurs in top 1/6th of the screen. 

With Kubuntu 12.10 the problem becomes much more visible.
You notice tearing on a large part of a window that is in thu opper part of the display.

The effect disappears when I turn off vsync in System Settings - Desktop Effects.
(But now the moving is not as smooth on other parts of the screen as with vsync on.)

Reproducible: Always

Steps to Reproduce:
1.start kde
2.drag a window horizontally accross the top part of the screen
3.
Actual Results:  
tearing

Expected Results:  
smooth

Comment 1 John van Spaandonk 2012-10-06 09:49:53 UTC

Created attachment 74373 [details]
my xorg log file

Just so you know what driver etc.

Comment 2 John van Spaandonk 2012-10-06 10:06:56 UTC

Tested it also with a recent fedora beta (USB boot) and the problem does not occur there.
It seems kde-specific. I realize there is not a lot of information in this remark but I guess
all observations are useful :-)
Thanks for a great great job on kwin, I can not say enough how happy I am with it!

Comment 3 Martin Flöser 2012-10-06 10:16:41 UTC

I do not understand comment #2. If it does not occur with Fedora it rather seems Kubuntu specific, doesn't it?

Comment 4 John van Spaandonk 2012-10-06 10:23:42 UTC

Hallo Martin,

I was not clear. I meant to say: Fedora with a prerelease of gnome 3.6.
So the problem seems KDE related, not specifically Kubuntu-related.

Comment 5 John van Spaandonk 2012-10-06 12:21:22 UTC

I just downloaded and tried Fedora 18 alfa, with KDE 4.9.0.0.
Same problem as with Kubuntu 12.10
We can say now that it is a KDE (kwin?) / intel driver problem.

Comment 6 Thomas Lübking 2012-10-06 12:24:15 UTC

About the same as the nvidia related tearing?

@John, can you record a video of this?

This https://bugzilla.gnome.org/show_bug.cgi?id=651312#c37
claims that you cannot blit during the retrace - if that was true, we can either move to full repaints (eventually by copying the front to the backbuffer for the undamaged parts?) and buffer swapping or mark all tearing bugs as wontfix.

ftr:
I have doubts on this claim, the memory amount would be rather small and also the only thing, i get tearing in the upper fraction with, is GL contents, there's claim about videos as well, but not here. However i have *never* seen tearing by just moving some windows, yet we've seen a video where this happened and the "tearing" was actually more like "paints with 500ms delay" - and that's not tearing

Comment 7 Thomas Lübking 2012-10-06 12:37:13 UTC

WOW, wait:
[    22.590] (II) Loading /usr/lib/xorg/modules/drivers/intel_drv.so
[    22.590] (II) Module intel: vendor="X.Org Foundation"
[    22.590] 	compiled for 1.11.3, module version = 2.17.0

Can you please try a somewhat recent driver for a sandybridge chip (2.17 is nearly a year old) - ideally an SNA one?

Comment 8 John van Spaandonk 2012-10-06 15:01:32 UTC

I created a video with the webcamoid, handholding my webcam.
It shows the tearing problem at the very top of the display.
And no tearing somewhat below this.
This is for kubuntu 12.04.

You can find the video at:
http://www.vanspaandonk-koks.nl/open_source/tearing%20problem%20intel%20sandy%20bridge%20kubuntu%2012.04.webm

perhaps you need to replace the %20 by spaces.
I used firefox - save page as - and then watched the video offline.

I hope you trust me when I say that on kubuntu 12.04 the tearing starts at a lower part of the screen.
I will start fedora alfa 18 kde from my usb now and report back the version of the video driver used there.

Comment 9 John van Spaandonk 2012-10-06 15:08:58 UTC

Fedora 18 alfa, booted clean from usb excerpt fro /var/log/Xorg.o.log
BTW this uses kernel 3.6RC2

FYI: the "tearing" I showed on the video is (vertically) twice as large as
with kubuntu 12.04.
If you need any more just let me know, I am anxious to help solve this
problem. BTW do any of you use sandy bridge graphics???

[    20.584] (II) LoadModule: "intel"
[    20.585] (II) Loading /usr/lib64/xorg/modules/drivers/intel_drv.so
[    20.631] (II) Module intel: vendor="X.Org Foundation"
[    20.631]    compiled for 1.12.99.903, module version = 2.20.2
[    20.632]    Module class: X.Org Video Driver
[    20.632]    ABI class: X.Org Video Driver, version 13.0
[    20.632] (II) LoadModule: "vesa"
[    20.632] (II) Loading /usr/lib64/xorg/modules/drivers/vesa_drv.so
[    20.638] (II) Module vesa: vendor="X.Org Foundation"
[    20.638]    compiled for 1.12.99.902, module version = 2.3.2
[    20.638]    Module class: X.Org Video Driver
[    20.638]    ABI class: X.Org Video Driver, version 13.0
[    20.638] (II) LoadModule: "modesetting"
[    20.638] (II) Loading /usr/lib64/xorg/modules/drivers/modesetting_drv.so
[    20.643] (II) Module modesetting: vendor="X.Org Foundation"
[    20.643]    compiled for 1.12.99.902, module version = 0.4.0
[    20.643]    Module class: X.Org Video Driver
[    20.643]    ABI class: X.Org Video Driver, version 13.0
[    20.643] (II) LoadModule: "fbdev"
[    20.644] (II) Loading /usr/lib64/xorg/modules/drivers/fbdev_drv.so
[    20.650] (II) Module fbdev: vendor="X.Org Foundation"
[    20.650]    compiled for 1.12.99.902, module version = 0.4.3

Comment 10 John van Spaandonk 2012-10-06 15:22:02 UTC

In comment #8 I intended to say: "Hope that you trust me when I say that on Kubuntu 12.10 the tearing starts at a lower part of the screen"

Comment 11 John van Spaandonk 2012-10-07 11:32:35 UTC

I now tried again fedora 18 preview, with SNA enabled.
At this point I do not know anything else to try so I will
wait for you guys to think of something :-)

I created a file /etc/X11/xorg.conf with these contents:

Section "Device"
 Identifier     "Intel Graphics"
 Driver "intel"
 Option "AccelMethod" "sna" 
EndSection

And restarted the x-server.

The tearing is exactly as before, so this did not help.

Excerpts from /var/log/Xorg.0.log:

...
     80 [   164.490] (II) LoadModule: "glx"
     81 [   164.490] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so
     82 [   164.490] (II) Module glx: vendor="X.Org Foundation"
     83 [   164.490]    compiled for 1.12.99.904, module version = 1.0.0
     84 [   164.491]    ABI class: X.Org Server Extension, version 6.0
     85 [   164.491] (==) AIGLX enabled
     86 [   164.491] Loading extension GLX
     87 [   164.491] (II) LoadModule: "intel"
     88 [   164.491] (II) Loading /usr/lib64/xorg/modules/drivers/intel_drv.so
     89 [   164.491] (II) Module intel: vendor="X.Org Foundation"
     90 [   164.491]    compiled for 1.12.99.903, module version = 2.20.2
     91 [   164.491]    Module class: X.Org Video Driver
     92 [   164.491]    ABI class: X.Org Video Driver, version 13.0
...
    259 [   165.023] (==) Depth 24 pixmap format is 32 bpp
    260 [   165.034] (II) intel(0): SNA initialized with SandyBridge backend
    261 [   165.034] (==) intel(0): Backing store disabled
    262 [   165.034] (==) intel(0): Silken mouse enabled
    263 [   165.034] (II) intel(0): HW Cursor enabled
    264 [   165.034] (II) intel(0): RandR 1.2 enabled, ignore the following RandR disabled message.
    265 [   165.034] (==) intel(0): DPMS enabled
    266 [   165.041] (II) intel(0): [DRI2] Setup complete
....

Comment 12 Thomas Lübking 2012-10-13 18:36:04 UTC

This patch might avoid tearing IN FULLSCREEN WINDOWS ONLY
The more feedback we can get on it, the more likely it'll get into the next release.

https://git.reviewboard.kde.org/r/106833/

Comment 13 Thomas Lübking 2012-10-13 20:20:12 UTC

*** Bug 308332 has been marked as a duplicate of this bug. ***

Comment 14 Ralf Jung 2012-10-14 09:18:28 UTC

On my system, this fixes tearing (using the teartest from http://ompldr.org/iYXBldg-hide) in full-screen mplayer (with the OpenGL backend). It does however *not* fix tearing with the full-screen VLC (OpenGL mode as well), nor in VLC maximized in windowed mode - the latter was to be expected, but I do this often. The reason may be that my screen is 16:10, the video is 16:9, and VLC is smart enough toa ctually not damage the black bars at the top and bottom - so KWin does not see a full-screen damage event.
If you decide to adjust the repaint logic to compensate for that, please have in mind there are people using a 5:4 screen (1280x1024) watching 16:9 videos - and movies for cinema are even wider.

Btw, why is this bug still unconfirmed?

Comment 15 Thomas Lübking 2012-10-14 09:41:52 UTC

(In reply to comment #14)
> On my system, this fixes tearing (using the teartest from
> http://ompldr.org/iYXBldg-hide) in full-screen mplayer (with the OpenGL
> backend). It does however *not* fix tearing with the full-screen VLC (OpenGL
> mode as well), nor in VLC maximized in windowed mode

Can you please enable the "Show Paint" plugin and check the paint behavior of vlc?
Also check another video output (xv, vdpau, x11)

> If you decide to adjust the repaint logic to compensate for that, please
> have in mind there are people using a 5:4 screen (1280x1024) watching 16:9
> videos - and movies for cinema are even wider.
Irrelevant since the black bars will be wider than the post retrace area anyway?

> Btw, why is this bug still unconfirmed?
We don't even 100% know what it is, thus whether we have to or can solve it.
Status quo is that glWaitVideoSync in different implementations *might* sync to the retrace end - or sync to the wrong device (unlikely but possible - do you have a multiscreen setup?) or ...
And in that case it would be invalid (or "upstream" etc.) anyway.

Personally I btw. give a shit on whether a bug is marked unconfirmed or new - there's is an issue to solve for at least one person unless the bug is marked resolved.

Comment 16 Ralf Jung 2012-10-14 09:57:18 UTC

(In reply to comment #15)
> Can you please enable the "Show Paint" plugin and check the paint behavior
> of vlc?
> Also check another video output (xv, vdpau, x11)
Indeed the paint plugin shows that "mplayer -vo gl" does full-screen repaints, while VLC does not.
vdpau does not work here (mplayerdoesn't support it and VLC framerate drops down too much to be useful). I will test the other backends later or tomorrow when I got the time.

> > If you decide to adjust the repaint logic to compensate for that, please
> > have in mind there are people using a 5:4 screen (1280x1024) watching 16:9
> > videos - and movies for cinema are even wider.
> Irrelevant since the black bars will be wider than the post retrace area
> anyway?
That might or might not work - the tearing line moves down at least one quarter, sometimes even a third of the screen here.
For a 16:9 movie on a 5:4 screen, the movie will be 720 pixels high, meaning the black bars have 280 pixels, which is less than a quarter, so if it tears like it does on my system it would be visible. If this depends more on the absolute pixels than on the relative sizes... I don't know^^

> > Btw, why is this bug still unconfirmed?
> We don't even 100% know what it is, thus whether we have to or can solve it.
> Status quo is that glWaitVideoSync in different implementations *might* sync
> to the retrace end - or sync to the wrong device (unlikely but possible - do
> you have a multiscreen setup?) or ...
I see.
I am using an external screen connected to the HDMI connector of my laptop. The laptop-internal screen is disabled. I do also have a 2nd GPU by NVidia in this machine, but it is turned off all the time during these tests and therefore should not interfere.

Comment 17 Thomas Lübking 2012-10-14 10:18:39 UTC

(In reply to comment #16)

> I do also have a 2nd GPU by NVidia in this machine, but it is turned
> off all the time during these tests and therefore should not interfere.

         DOES ANYONE HERE HAVE AN NVIDIA FREE SYSTEM?

Sorry for shouting, but could be quite relevant ;-)

> If this depends more on the absolute pixels than on the relative sizes...
In case the problem is syncing to the end of the retrace, it's absolute pixels and relates more to the memory speed (and pot. the screen width)

> I am using an external screen connected to the HDMI connector of my laptop.
Do you get the same (or similar) behavior on the internal screen?

Comment 18 Ralf Jung 2012-10-14 10:21:35 UTC

I've had some more time than expected, so here are my test results:

x = tears heavily (i.e. a tearing line is visible at least half the time)
~ = tears a bit (i.e. a tearing line is sometimes visible)
- = does not tear at all

W = Window (mayimized)
FS = Fullscreen

I did not include mplayer with X11 backend as it does not do upscaling, so the image is small on my screen
and can't be compared to the others.

master
            | GL | Xv | X11
VLC W       | x  | ~  | x
VLC FS      | x  | ~  | x
mplayer W   | x  | ~  |
mplayer FS  | x  | ~  |

master+patch

            | GL | Xv | X11
VLC W       | x  | ~  |  x
VLC FS      | x  | ~  |  x
mplayer W   | x  | ~  |
mplayer FS  | -  | ~  |


I added a debug statement to kwin telling me when the swap case is used, and indeed this correlates with the visual observations: When using mplayer -vo gl in fullscreen, every frame is swapped. For any other case, sometimes a frame is swapped, but most of the time it is not. This also correlates with the output of the paint plugin: Only mplayer -vo gl is actually performing full-screen redraws.

(In reply to comment #17)
>          DOES ANYONE HERE HAVE AN NVIDIA FREE SYSTEM?
> 
> Sorry for shouting, but could be quite relevant ;-)
I can feel your pain ;-) but, you see, I want to do gaming on Linux, and then there is not much choice.
On topic though, as I said, the card is turned off and the nvidia kernel module is not even loaded, nor is the NVidia GL implementation.


> > I am using an external screen connected to the HDMI connector of my laptop.
> Do you get the same (or similar) behavior on the internal screen?
Will test tomorrow. The internal screen is 16:9 though and much smaller, so I expect (a) your patch to kick in for all backens in full-screen, as now all pixels *have* to be damaged, and (b) it's much harder to see small tearing.

Comment 19 Ralf Jung 2012-10-14 10:23:09 UTC

Argh, bugs.kde.org seems to use a strange font which doesn't have a proper tilde...
Xv was always "tearing a bit", and mplayer FS with the patch applied was the only case with no tearing at all.

Comment 20 Thomas Lübking 2012-10-14 11:11:10 UTC

(In reply to comment #18)
> I can feel your pain ;-)

That's not the point, but we've always had tearing issue reports with nvidia boards and recently got some for intel.
Now, if all those intel reporters actually use optimus, that smells suspicious to me ;-)

> I want to do gaming on Linux
I've an nvidia GPU + the blob as well, i'm completely agnostic on this topic and in general the system works nicely. I just want to pin down this issue.

> On topic though, as I said, the card is turned off and the nvidia kernel
> module is not even loaded, nor is the NVidia GL implementation.
How's the HDMI chip wired? Connected to the nvidia chip or to the intel one (where the nvidia one copies the framebuffer)?
 
> Will test tomorrow. The internal screen is 16:9 though
-> cinemascope / panavision / cinemascope55

However the much more relevant question is whether the non swapping sync works with the internal screen.

Comment 21 John van Spaandonk 2012-10-14 15:34:42 UTC

On 10/14/2012 01:11 PM, Thomas Lübking wrote:
> https://bugs.kde.org/show_bug.cgi?id=307965
>
> --- Comment #20 from Thomas Lübking <thomas.luebking@gmail.com> ---
> (In reply to comment #18)
>> I can feel your pain ;-)
> That's not the point, but we've always had tearing issue reports with nvidia
> boards and recently got some for intel.
> Now, if all those intel reporters actually use optimus, that smells suspicious
> to me ;-)
>
>> I want to do gaming on Linux
> I've an nvidia GPU + the blob as well, i'm completely agnostic on this topic
> and in general the system works nicely. I just want to pin down this issue.
>
>> On topic though, as I said, the card is turned off and the nvidia kernel
>> module is not even loaded, nor is the NVidia GL implementation.
> How's the HDMI chip wired? Connected to the nvidia chip or to the intel one
> (where the nvidia one copies the framebuffer)?
>
>> Will test tomorrow. The internal screen is 16:9 though
> -> cinemascope / panavision / cinemascope55
>
> However the much more relevant question is whether the non swapping sync works
> with the internal screen.
>
I can confirm that my main system is NVIDEA-FREE!!
This is the system I reported (and filmed) the tearing issue on.
The system has an ASUS Maximus IV gene-Z blahblah board with
  k2500 sandy bridge and am perfectly happy with the speed of the graphics.
I just use a single monitor, really the setup is as simple as can be :-)

In addition I want to report the following:
- The teartest (movie) mentioned earlier also tears in this system.
   So I think there is at least some chance that if a person reports 
video tearing solved, that
   my problem is solved as well.
- I cannot apply a patch - don't have the time to compile etc.
- I have a 24" DELL monitor with a resolution of 1920 X 1200...
- My laptop, with pre-sandy bridge graphics and 1920X1080 screen does 
NOT show tearing in the tear video test.
This correlates with NO tearing when moving windows manually on that system.

Comment 22 John van Spaandonk 2012-10-14 15:39:03 UTC

On 10/14/2012 05:34 PM, John van Spaandonk wrote:
> https://bugs.kde.org/show_bug.cgi?id=307965
>
> --- Comment #21 from John van Spaandonk <john@van-spaandonk.nl> ---
> On 10/14/2012 01:11 PM, Thomas Lübking wrote:
>> https://bugs.kde.org/show_bug.cgi?id=307965
>>
>> --- Comment #20 from Thomas Lübking <thomas.luebking@gmail.com> ---
>> (In reply to comment #18)
>>> I can feel your pain ;-)
>> That's not the point, but we've always had tearing issue reports with nvidia
>> boards and recently got some for intel.
>> Now, if all those intel reporters actually use optimus, that smells suspicious
>> to me ;-)
>>
>>> I want to do gaming on Linux
>> I've an nvidia GPU + the blob as well, i'm completely agnostic on this topic
>> and in general the system works nicely. I just want to pin down this issue.
>>
>>> On topic though, as I said, the card is turned off and the nvidia kernel
>>> module is not even loaded, nor is the NVidia GL implementation.
>> How's the HDMI chip wired? Connected to the nvidia chip or to the intel one
>> (where the nvidia one copies the framebuffer)?
>>
>>> Will test tomorrow. The internal screen is 16:9 though
>> -> cinemascope / panavision / cinemascope55
>>
>> However the much more relevant question is whether the non swapping sync works
>> with the internal screen.
>>
> I can confirm that my main system is NVIDEA-FREE!!
> This is the system I reported (and filmed) the tearing issue on.
> The system has an ASUS Maximus IV gene-Z blahblah board with
>    k2500 sandy bridge and am perfectly happy with the speed of the graphics.
> I just use a single monitor, really the setup is as simple as can be :-)
>
> In addition I want to report the following:
> - The teartest (movie) mentioned earlier also tears in this system.
>     So I think there is at least some chance that if a person reports
> video tearing solved, that
>     my problem is solved as well.
> - I cannot apply a patch - don't have the time to compile etc.
> - I have a 24" DELL monitor with a resolution of 1920 X 1200...
> - My laptop, with pre-sandy bridge graphics and 1920X1080 screen does
> NOT show tearing in the tear video test.
> This correlates with NO tearing when moving windows manually on that system.
Another not: I did not play the video full screen, just in a window.

Comment 23 Ralf Jung 2012-10-15 13:42:38 UTC

> vdpau does not work here (mplayerdoesn't support it and VLC framerate drops
> down too much to be useful).
I mixed two things up here: VA-API does not work. vdpau I did not even test, no idea how to get that working with optimus.

> How's the HDMI chip wired? Connected to the nvidia chip or to the intel one (where the nvidia one copies the framebuffer)?
HDMI is connected to the Intel card. To use the NVidia card, I use bumblebee which performs rendering on the 2nd card and copies stuff back. But all the video tests mentioned here were done directly on the Intel card.

>> Will test tomorrow. The internal screen is 16:9 though
> cinemascope / panavision / cinemascope55
Sorry, I do not understand - what are you saying?

> However the much more relevant question is whether the non swapping sync works > with the internal screen.
I only tested with the GL backend: In mplayer, there is no tearing in full-screen mode, but windowed mode has tearing. In VLC, there is tearing in both modes - even though the video covers the entire screen. The fullRepaint condition is still not met, and if I enable the paint plugin the tearing line is visible on the entire width - the "paint" colours also tear.
So, there's actually no difference here between the internal and the HDMI panel, even though the video aspect ratio matches the internals screen and *not* the HDMI screen.

Comment 24 Thomas Lübking 2012-10-15 15:01:55 UTC

(In reply to comment #23)
> I mixed two things up here: VA-API does not work. vdpau I did not even test,
> no idea how to get that working with optimus.
vdpau is only available with nvidia, xvmc should work, though.
 
> >> Will test tomorrow. The internal screen is 16:9 though
> > cinemascope / panavision / cinemascope55
> Sorry, I do not understand - what are you saying?
those are cinematic aspects which are all much more narrow than 16:9 - so you'll get black bars.
 
> I only tested with the GL backend: In mplayer, there is no tearing in
> full-screen mode, but windowed mode has tearing. In VLC, there is tearing in
> both modes - even though the video covers the entire screen. The fullRepaint
> condition is still not met
Have you checked this (with a debug out in the glXSwapBuffer branch?)
Could mean that vlc tears internally and causes partial damage events crossing the retrace spot.

Comment 25 Ralf Jung 2012-10-15 15:09:45 UTC

(In reply to comment #24)
> those are cinematic aspects which are all much more narrow than 16:9 - so
> you'll get black bars.
I see - but I got this teartest video only in one resolution ;-) . However, VLC has some option to stretch the video, and I guess mplayer has that, too. Not that it's needed currently.

> > I only tested with the GL backend: In mplayer, there is no tearing in
> > full-screen mode, but windowed mode has tearing. In VLC, there is tearing in
> > both modes - even though the video covers the entire screen. The fullRepaint
> > condition is still not met
> Have you checked this (with a debug out in the glXSwapBuffer branch?)
Yes, I used a debug output in that branch. When using mplayer, I get a *lot* of them - as is to be expected, about one per frame. For VLC, there's just 10 to 20 full-screen redraws during the entire (30sec) video.
With compositing disabled, VLC produces some tearing, but much less than with compositing (comparable to using the XVideo backend and compositing).

Comment 26 Thomas Lübking 2012-10-15 20:55:52 UTC

Created attachment 74570 [details]
Christmas time

The attached x-mas patch enforces the glXSwapBuffer branch, writing back the frontbuffer into the backbuffer.

Please NOTICE:
- fullscreen memory copy is expensive, not as expensive as fullscreen repaints, but is. No problem on dedicated GPUs with memory speed beyond good and evil, but i've NOT tested that on an integrated system so far
- the code assumes we can glXSwapBuffer, rather don't run it if you can't (or revert if if freaky things happen ;-)
- the patch can be streamlined (ie. it's not required to turn the swapInterval on and off and also we can omit some extra drop in/out frames when switching between swapBuffer and partial copies

Happy testing - there *will* be no tearing.... prettyprettyplease not :-)

Comment 27 Ralf Jung 2012-10-16 20:35:01 UTC

Indeed, I can't see any tearing with this patch - neither when dragging a plasma widget horizontally at the top of the screen, nor when playing a video either in windowed or in full-screen mode. :)

However, the framerate drops to 30-40fps. I wonder why this is, since I can run some old games (and demos like glxspheres) with 60fps on the NVidia GPU, which also involves copying the ful -screen to the intel card for each frame.

Comment 28 Thomas Lübking 2012-10-16 20:55:07 UTC

- The two GPUs will not share the same memory? (r -> w ./. r/w)
- Also the memory might just get boosted when the nvidia GPU is up.
- Last but not least, the driver may ("Tear free") perform a sync itself - tried deactivating kwin v'syncing?

I tried to only copy back the dirty memory, but that fails here (happy flicker) - might be related to triple buffering :-(

Other tested things for the record: glWaitVideoSync *is* broken (on nvidia)
Waiting for the sync and then swapping the buffer (instead of using glSwapInterval) causes similar to the present tearing despite the swap/copy back patch is otherwise the same.

Comment 29 Brian Hill 2012-10-16 21:32:29 UTC

I backported the patch to 4.9.2 and can confirm that it absolutely kills performance on my system with integrated graphics :

Arch Linux
Mesa 9.0 
xf86-video-intel 2.20.9
Core i7 2630QM
8 Gigabytes DDR3
Intel HD Graphics 3000

With the patch applied, I get an FPS in the teens to mid twenties and there is noticeable lag.

Comment 30 valdikss 2012-10-16 21:34:23 UTC

Please test with mesa 8.0.4, there are some performance issues with 9.0 on HD3000. See https://bugs.freedesktop.org/show_bug.cgi?id=55998 and https://bugs.kde.org/show_bug.cgi?id=308385

Comment 31 Brian Hill 2012-10-17 00:23:24 UTC

With Mesa 8, the speed improves however I am getting graphical corruption with the patch.  It appears to be a problem with the alpha blending when using the feature where you drag windows to the edges of the screen to maximize them; with the patch applied, it flickers.

Comment 32 Ralf Jung 2012-10-17 16:11:12 UTC

(In reply to comment #28)
> - The two GPUs will not share the same memory? (r -> w ./. r/w)
No, they won't. Neither the Xorg nor the Intel driver I have installed support DMA-BUF, let alone the NVidia driver ;-) So the image is copied two times: It's read back from the NVidia GPU, and then sent to the Intel GPU.

> - Also the memory might just get boosted when the nvidia GPU is up.
I turned on the NVidia card, then tried again - no changes.

> - Last but not least, the driver may ("Tear free") perform a sync itself -
> tried deactivating kwin v'syncing?
Indeed I can't see any tearing even after disabling vsync in KWin's configuration. The framereate goes up to 50-55fps, but it doesn't reach the 60 it has without the patch.

Comment 33 Ralf Jung 2012-10-17 18:59:18 UTC

With the "old" KDE 4.9.2 compositor, I get 60 fps even when playing a Full-HD video full-screen (using mplayer or VLC, GL backend). So the bandwidth is there, somehow.

Comment 34 Thomas Lübking 2012-10-17 20:26:12 UTC

Created attachment 74612 [details]
Hanukkah

If the back and front buffer are in different memory, that means an _additional_ copy for the glXSwapBuffer call.

The new patch has some performance improvements.
1st, it does not copy back the entire buffer but only the required parts
2nd, after a short time (24 frames, 1s for a cinematic movie) it enters a performance mode, assuming the fullscreen mode will continue (and it's thus not necessary to copy things back)

"2" does obviously not work with VLC, but one could -theoretically- relax that and check whether always the same region is updated.

The patch includes a code refacturing, i'm not sure how easy or possible it will be to port it back to 4.9 :-(

Comment 35 Thomas Lübking 2012-10-18 03:48:18 UTC

Created attachment 74617 [details]
Hanukkah #2

Forgot to restart the syncdelay timer - the first Hanukkah patch could feel chewy - sorry in case someone has already tried it.

Comment 36 Ralf Jung 2012-10-19 16:18:40 UTC

(In reply to comment #34)
> The new patch has some performance improvements.
> 1st, it does not copy back the entire buffer but only the required parts
To be honest I do not understand why anything has to be copied back as it was kwin who copied it to the GPU to start with - but I don't know OpenGL well, so whatever ;-)

> 2nd, after a short time (24 frames, 1s for a cinematic movie) it enters a
> performance mode, assuming the fullscreen mode will continue (and it's thus
> not necessary to copy things back)
> 
> "2" does obviously not work with VLC, but one could -theoretically- relax
> that and check whether always the same region is updated.
Framerate for a still desktop is 57-58 fps now (opposed to 59-62 in KDE 4.9.2). It drops to ~50fps in full-screen video playback - I could not see any difference between mplayer and VLC. There was no tearing whatsoever :)
While logging in, the screen went black instead of showing the splash screen - not sure if that's related to the patch.

Comment 37 Thomas Lübking 2012-10-19 18:43:02 UTC

(In reply to comment #36)
> (In reply to comment #34)
> Framerate for a still desktop is 57-58 fps now (opposed to 59-62 in KDE
> 4.9.2).

That will likely be due to differences in glXSwapInterval and the broken glXWaitVideoSync - 62fps on an 60fps screen should not be possible in the first place.

> It drops to ~50fps in full-screen video playback

also with mplayer or only vith vlc (because on constant fullscreen updates, the back-copying overhead should be omitted and the difference would be _purely_ for the usage of glxSwapInterval)

> any difference between mplayer and VLC. There was no tearing whatsoever :)
At least it works.

> While logging in, the screen went black instead of showing the splash 
> screen - not sure if that's related to the patch.
Might fall into the same category as the "flickering" effectframes (the problem is that they paint on top of the even undamaged backbuffer, ie. the content below was not repainted, but they are. Would need adaption of that code but is likely fixable)


--- sOT --

> To be honest I do not understand why anything has to be copied back as it
> was kwin who copied it to the GPU to start with - but I don't know OpenGL
> well, so whatever ;-)
There are two buffers, front and back.
You paint into the backbuffer which is currently not visible (otherwise flicker and tearing would be inavoidable) and when you're done, move the backbuffer content to the frontbuffer (which powers the screen pixels)
That move can be done by copying parts of the buffer or the entire buffer or ("flipping") by just telling the GPU that now the front is the backbuffer and vv. (obviously last is fastest, but the buffers must be on the same memory)

Because usually only a minor fraction of the screen is actually changed, kwin renders only that fraction into the backbuffer and then copies the resulting part into the frontbuffer.
Now, if you want to swap buffers (for whatever reason, ours being that the glWaitVideoSync extension is apparently broken) you have to ensure that the entire backfuffer is sane, the part you did not alter in this pass matches the current frontbuffer part.
For that to happen, you have to ensure the backbuffer is a pixelperfect copy of the _current_ frontbuffer (and not the frontbuffer one or two passes ago), therefore it's necessary to copy the differing parts from the front into the backbuffer before starting the actual painting (of the no to update screen part)

Comment 38 Loïc Yhuel 2012-10-21 18:52:10 UTC

(In reply to comment #37)
> For that to happen, you have to ensure the backbuffer is a pixelperfect copy
> of the _current_ frontbuffer (and not the frontbuffer one or two passes
> ago), therefore it's necessary to copy the differing parts from the front
> into the backbuffer before starting the actual painting (of the no to update
> screen part)
With triple buffering, you have to copy regions updated in the last 2 frames, instead of of the last frame only, so the difference with a full screen composition may be lower when there aren't many windows.

Btw, see iclke's comments in http://phoronix.com/forums/showthread.php?74598-Intel-Linux-Driver-Still-Working-To-Address-Tearing
There is no hardware support of synchronized updates on Sandy Bridge or later IGPs, so page flipping is mandatory to avoid tearing.
"MESA_copy_sub_buffer was originally created as an optimisation and all the compositors were encouraged to use it. In retrospect, it was a bad idea, hurting the bandwidth constrained IGP devices the most."
I wonder why back to front buffer copy would use more bandwidth than front to back copy, or full screen composition.

Comment 39 root 2012-10-21 19:16:21 UTC

I would really appreciate complete (read sole) page flipping support.
Because right now with a multi-monitor setup and recent intel graphics only one monitor is ever tear-free.

Comment 40 Thomas Lübking 2012-10-21 19:25:51 UTC

(In reply to comment #38)

> With triple buffering, you have to copy regions updated in the last 2
Yes, the patch does so. But i didn't want to make things even more complex ;-)

> so the difference with a full screen composition may be lower
No.
1. Because really most of the time a very minor fraction is updated and often it's even the same (eg. a line in a texteditor or an animated icon)
2. Because fullscreen composition would mean to actually repaint the entire screen all the time, what includes all effect processing and all blurring (Philip would not waste any time on discussion, but just kill me if i tried to wipe all his optimizations in that regard. And he would be right -while not successful ;-)- in doing so.)

> I wonder why back to front buffer copy would use more bandwidth than front
> to back copy
Does not (should not) - the buffers should be of same speed on read and write.
But copying back to front would require a working waitVideoSync (to avoid tearing) what is apparently no longer the case on intel chips and was for a long time not on nvidia (and likely AMD neither) - it appears to me the syncing happens on the end of the retrace, but given that GPUs _want_ buffer swapping and glSwapInterval (eg. for power savings) that doesn't matter either.

> or full screen composition.
Will not. Reprocessing the entire event chain consumes CPU and GPU + VRAM (to write the raster result) - if that should still be cheaper than copying a memory fraction, i'm lost (but given bug reports for some plasmoids causing full desktop repaints, i doubt this is -or at least "was"- the case)

Comment 41 Thomas Lübking 2012-10-21 19:28:28 UTC

(In reply to comment #39)
> I would really appreciate complete (read sole) page flipping support.
> Because right now with a multi-monitor setup and recent intel graphics only
> one monitor is ever tear-free.

I would not expect the IGP to be able to sync to more than one screen at all.
You can try the patch to check, it does that.

Comment 42 root 2012-10-21 19:41:15 UTC

I would need a 4.9 backport since my knowledge of C++ and graphics is very limited and in 4.9 there simply is no glxbackend.cpp.

Apart from that the HD3000 can sync at least two screens, works on windows 7 (aero only though, I think that is normal).

Comment 43 Loïc Yhuel 2012-10-21 20:07:20 UTC

(In reply to comment #40)
> (In reply to comment #38)
> > or full screen composition.
> Will not. Reprocessing the entire event chain consumes CPU and GPU + VRAM
> (to write the raster result) - if that should still be cheaper than copying
> a memory fraction, i'm lost (but given bug reports for some plasmoids
> causing full desktop repaints, i doubt this is -or at least "was"- the case)
I wouldn't be surprised if it would be the case when the composition is minimal (for example with a full screen or maximized window) one some GPUs : textures and back buffers don't necessarily have the same tiling format, so the texture to back buffer copy could be faster (or slower) than front to back buffer copy.
In the future, with PRIME offloading, the front to back buffer copy may be slow, when kwin is running on the discrete GPU, with front and back buffers in IGP memory. Deciding when it's better to render a full frame, copy from front buffer, or perhaps render to an intermediate surface will become more difficult (hopefully the hardware getting faster means less optimizations are necessary, but if high DPI screen become popular that won't be the case soon).

Comment 44 Loïc Yhuel 2012-10-21 21:01:44 UTC

Looking at your patch, the "damage = lastDamage() | secondLastDamage;" is done for buffer swap case, but without double/triple buffer testing, so double buffering suffers from having a bigger than necessary copy.

One thing which may matter too : to be able to copy from front to back buffer, you wait for the rendering to be finished with glXWaitGL. It means that even with triple buffering, only one frame is processed at a time. It would be better to do the copy asynchronously, or at the end of the rendering if possible to allow more parallelism.

Comment 45 Ralf Jung 2012-10-22 16:53:00 UTC

(In reply to comment #37)
> > It drops to ~50fps in full-screen video playback
> 
> also with mplayer or only vith vlc (because on constant fullscreen updates,
> the back-copying overhead should be omitted and the difference would be
> _purely_ for the usage of glxSwapInterval)
With both players. Debugging shows that the fullRepaintCounter actually never exceeds 9.
Would 0-region updates in between hurt? There are of course less than 60fps in the video file.

Also, I noticed that in the fps counter, these green bars (which I assume measure some kind of time-per-frame?) are all as high as the first black bar. In KDE 4.9, they acre actually even smaller, less than half of the first black bar. Is that to be expected, or does it indicate a problem?

Comment 46 Thomas Lübking 2012-10-22 20:10:00 UTC

(In reply to comment #44)
> Looking at your patch, the "damage = lastDamage() | secondLastDamage;" is
> done for buffer swap case, but without double/triple buffer testing, so
> double buffering suffers from having a bigger than necessary copy.

There's no such branching since (aside the patch is absolutely not ready for shipping, lacks option and better handling if glSwapInterval isn't available) because i failed to figure a "legal" way to determine whether there's some triple buffering in place (aside running a couple of glSwapBuffers and see how long that took...)

> One thing which may matter too : to be able to copy from front to back
> buffer, you wait for the rendering to be finished with glXWaitGL. It means
> that even with triple buffering, only one frame is processed at a time.
The call right after the copy should be superflous in the triple buffering altogether, but the (pre-existing) one on the end is required - or you get laggy processing.
Reason: see above.

Comment 47 Thomas Lübking 2012-10-22 20:24:16 UTC

(In reply to comment #45)
> (In reply to comment #37)
> > > It drops to ~50fps in full-screen video playback
> > 
> > also with mplayer or only vith vlc (because on constant fullscreen updates,
> > the back-copying overhead should be omitted and the difference would be
> > _purely_ for the usage of glxSwapInterval)

> With both players. Debugging shows that the fullRepaintCounter actually
> never exceeds 9.
Stupid me - you got the FPS counter active -> triggers partial repaints every now and then.
We could need a offline FPS option (ie. do not actively repaint the thing, but print to stdout or so)

> Also, I noticed that in the fps counter, these green bars (which I assume
> measure some kind of time-per-frame?)
I have _no_ idea what that thing does ;-)
However the green line is always on the bottom for me.
But right now i've got the "glXWaitGL();" deactivated (see last comment)

Comment 48 Loïc Yhuel 2012-10-22 20:46:05 UTC

(In reply to comment #46)
> (In reply to comment #44)
> > One thing which may matter too : to be able to copy from front to back
> > buffer, you wait for the rendering to be finished with glXWaitGL. It means
> > that even with triple buffering, only one frame is processed at a time.
> The call right after the copy should be superflous in the triple buffering
> altogether, but the (pre-existing) one on the end is required - or you get
> laggy processing.
> Reason: see above.
The added glXWaitGL is before the glCopyPixels, not after. I'm not sure how you could just remove it since glXSwapBuffers should return immediately in triple buffering mode. If the glCopyPixels schedules the copy to start after the end of the previous frame, it will work, but it probably will delay what you do after (rendering of updated regions) which doesn't depend on the previous frame, so it's still not optimal.
I more familiar with EGL than GLX, so perhaps I'm wrong, but IMHO the best synchronous solution (something better could perhaps be done with threads) would be :
 - render updated regions of frame N
 - copy back unchanged parts from frame N-1 into frame N
 - render updated regions of frame N+1
 - copy back unchanged parts from frame N into frame N+1
 - ...
With this order, you can start rendering frame N+1 (at least GL calls, and perhaps GPU parallelism) when frame N completes.

Comment 49 Ralf Jung 2012-10-24 15:20:06 UTC

(In reply to comment #47)
> > With both players. Debugging shows that the fullRepaintCounter actually
> > never exceeds 9.
> Stupid me - you got the FPS counter active -> triggers partial repaints
> every now and then.
> We could need a offline FPS option (ie. do not actively repaint the thing,
> but print to stdout or so)
I disabled the FPS counter, but even then the debug output shows that the full screen repaint counter does not exceed 5.

> > Also, I noticed that in the fps counter, these green bars (which I assume
> > measure some kind of time-per-frame?)
> I have _no_ idea what that thing does ;-)
> However the green line is always on the bottom for me.
How far to the bottom?

> But right now i've got the "glXWaitGL();" deactivated (see last comment)
There are a bunch of them in that function, which one are you referring to?

Also, with this patch applied I can not logout properly anymore. Or maybe logout is entirely broken in master, I do not know...

Comment 50 Ralf Jung 2012-10-27 12:42:35 UTC

I did some more - crude - experimenting and changed SceneOpenGL::paint to always issue a full-screen repaint. That increased the framerate to stable 60fps even when playing a full-screen window. However the diagram shown in the left part of the FPS window is not always green now, but alternating between short green and yellow phases (each ~5-10 pixels wide). Whatever that means, I believe it is related to frame latency. Disabling v-sync makes this much worse: The framerate stays at 60fps, but the latency (or whatever) bar is very high now, always yellow, sometimes even green. The logout effect works fine in both cases, as well as the overlays to put the window in a screen edge.

So somehow, copying stuff back and forth is actually slower than always re-rendering it... The compiz developers made the same experience, see https://bugs.launchpad.net/ubuntu/+source/compiz/+bug/901097

Comment 51 Martin Flöser 2012-10-27 13:25:51 UTC

given that it should be possible that we add an option to always do full 
repaints.

What I would like to see are tests with the OpenGL on EGL case which is 
possible with 4.10, though I think that it doesn't make any difference

Comment 52 Thomas Lübking 2012-10-27 14:06:52 UTC

(In reply to comment #50)
> I did some more - crude - experimenting and changed SceneOpenGL::paint to
> always issue a full-screen repaint. That increased the framerate to stable
> 60fps even when playing a full-screen window.

Actually esp. that case is hardly surprising, given that you stated to not enter the pure swapping and the fullscreen window will bypass most processing.

What happens if you play a windowed movie, eg 640x480 and actually have blurring regions (panels, windows) on the screen and what impact next to the (capped) framerate does that have on the CPU load / GPU temperature?


> Whatever that means, I believe it is related to frame latency.
The right area corresponds to the repainted area where the left area reflects the time requred to paint a frame - large bars mean "long paint" ie. they're responsible for lowering the FPS in the effects (ie. the buffer flush and by this back-copying is *not* included here)
A high bar here however lowers the mean FPS.

> Disabling v-sync makes this much worse: The framerate stays at 60fps, but the latency (or whatever)
> bar is very high now
That actually makes little sense - w/o seeing your actual changes this is hard to explain.

> So somehow, copying stuff back and forth is actually slower than always
> re-rendering it... The compiz developers made the same experience, see
> https://bugs.launchpad.net/ubuntu/+source/compiz/+bug/901097

I'd be very careful on such observations becasue they do not include relevant system information like used GPU and screensize.
Just repainting everything can free some CPU load (because several region intersections etc. don't have to happen) by adding GPU overhead (pixel processing) - so if you've a weak CPU (eg. P4) alongside a powerfull GPU (nvidia already had the 6000 at that time, did they?) this can indeed be a nice trade-off, but might severely punish weak GPU systems (IGP, budget GPU)

Comment 53 Ralf Jung 2012-10-27 14:12:59 UTC

> given that it should be possible that we add an option to always do full 
> repaints.
Maybe that should be bound to the v-sync option, as this bug suggests that v-sync can't be done reliably without some form of full-screen update.
Is there potential for simplification in the code managing the region? I don't know how expensive all these region operations are.

Another idea I had: Instead of copying the changed parts of the (new) front buffer to the back buffer, and then painting the next frame there (which also makes assumptions about where that backbuffer comes from), maybe it is better to copy the parts that are *outside* the damage region from the current front buffer to the back buffer before doing the SwapBuffers. That may copy identical data, but at least it won't copy twice the full screen if the damage region is almost the entire screen. I'll try to implement that.

> What I would like to see are tests with the OpenGL on EGL case which is 
> possible with 4.10, though I think that it doesn't make any difference
I do have master compiled and the mesa libegl (including -dev) installed. How can I test the EGL backend?
Also, I saw some comments saying that v-sync is not supported with EGL, is that still true?

Comment 54 Ralf Jung 2012-10-27 14:42:07 UTC

(In reply to comment #52)
> What happens if you play a windowed movie, eg 640x480 and actually have
> blurring regions (panels, windows) on the screen and what impact next to the
> (capped) framerate does that have on the CPU load / GPU temperature?
I moved the video window underneath the panel and the expanded calendar (the largest blurred regions I have), with no change in framerate.
Using the "Grid" Alt-Tab-Switcher (which is almost a full-screen blur region) with the video playing behind it maximized, framerate drops below 20fps - but that happens on my KDE 4.9 installation as well. Using a still background, it's 30fps for full-screen repaints and 60fps for KDE 4.9.

> > Disabling v-sync makes this much worse: The framerate stays at 60fps, but the latency (or whatever)
> > bar is very high now
> That actually makes little sense - w/o seeing your actual changes this is
> hard to explain.
The yellow peeks (with v-sync) going beyond the lowest bar are a bit below the 20fps line, so I think that's 17fps - the time for one frame. Actually I was surprised to find out that usually, that time is much lower: Since the fps plugin measures the time between pre- and post-paint, I'd have it expected to include the swap buffers blocking for the v-sync. But then I noticed there is complicated timer logic in composite.cpp to somehow manually sync (?), and I may have broken it when I patched around, as I don't understand it - I call startRenderTimer directly after swapping buffers. I changed setCompositeTimer to always wait <=1ms (which may be complete nonsense), now the "draw time" bar is at constant 17fps with v-sync enabled (I also changed the fps widget to move the lowest line to 17fps, so I can easily see that).
I'll soon start to experiment with a simple test app to learn more about GLX and v-sync etc.
Should kwin avoid blocking in glXSwapBuffers, or is that code just an attempt to get working v-sync even without glXSwapBuffers?

> I'd be very careful on such observations becasue they do not include
> relevant system information like used GPU and screensize.
> Just repainting everything can free some CPU load (because several region
> intersections etc. don't have to happen) by adding GPU overhead (pixel
> processing) - so if you've a weak CPU (eg. P4) alongside a powerfull GPU
> (nvidia already had the 6000 at that time, did they?) this can indeed be a
> nice trade-off, but might severely punish weak GPU systems (IGP, budget GPU)
That was just an observation for my specific system, with a 1920x1200 screen and the HD 3000 built into a 2nd-gen Core i5.
I read quite often recently that always doing a buffer-swap is "the way to go", e.g. in some Wayland talks and in the Intel bugreport about the v-sync issues (where they say that Windows Vista+ always does a buffer swap, so the hardware guys are optimizing for that case, and it actually saves noticeable amounts of power). I don't have the knowledge to say how true that actually is, so I just do my guesses and experiments, restricted to my own system. I am, however, using an IGP and therefore operating at the lower part of the GPU computational power spectrum, even though it's a quite recent IGP.

Comment 55 Martin Flöser 2012-10-27 14:42:48 UTC

> Is there potential for simplification in the code managing the region? I
> don't know how expensive all these region operations are.
when running callgrind I can see that we spend most of our CPU time on region 
intersections, translations etc. But it's very difficult to say whether that 
matters at all, given that the GPU usage cannot be profiled with callgrind. 
Optimizing that part is non-trivial, I have some ideas but so far not yet the 
time to implement them.
> 
> Another idea I had: Instead of copying the changed parts of the (new) front
> buffer to the back buffer, and then painting the next frame there (which
> also makes assumptions about where that backbuffer comes from), maybe it is
> better to copy the parts that are *outside* the damage region from the
> current front buffer to the back buffer before doing the SwapBuffers. That
> may copy identical data, but at least it won't copy twice the full screen
> if the damage region is almost the entire screen. I'll try to implement
> that.
For most cases that will be more in fact. I'm just typing in an editor, the 
only changes are new characters being added. Given that this is a very small 
area (max 100x100) compared to the 3200x1080 of my setup it doesn't sound 
healthy ;-)
> 
> > What I would like to see are tests with the OpenGL on EGL case which is
> > possible with 4.10, though I think that it doesn't make any difference
> 
> I do have master compiled and the mesa libegl (including -dev) installed.
> How can I test the EGL backend?
KWIN_OPENGL_INTERFACE=egl kwin --replace &

It should tell somewhere:
KWin::SceneOpenGL::createScene: Forcing EGL native interface through 
environment variable

Be aware that we had many reports about non functional EGL backend with Mesa 
9.0
> Also, I saw some comments saying that v-sync is not supported with EGL, is
> that still true?
yes, we do not v-sync in the EGL case.

Comment 56 Ralf Jung 2012-10-27 14:59:09 UTC

(In reply to comment #55)
> For most cases that will be more in fact. I'm just typing in an editor, the 
> only changes are new characters being added. Given that this is a very small 
> area (max 100x100) compared to the 3200x1080 of my setup it doesn't sound 
> healthy ;-)
Doesn't a full-screen repaint do the same thing, i.e. copy the full window texture again and again for each frame?


> KWIN_OPENGL_INTERFACE=egl kwin --replace &
> 
> It should tell somewhere:
> KWin::SceneOpenGL::createScene: Forcing EGL native interface through 
> environment variable
> 
> Be aware that we had many reports about non functional EGL backend with Mesa 
> 9.0
Debian testing is still at Mesa 8.something, and this is working fine. There's an error message that libEGL "failed to create a pipe screen", but everything looks just normal. However, there's of course heaving tearing without v-sync...

Comment 57 Thomas Lübking 2012-10-27 15:08:20 UTC

(In reply to comment #54)
> Using a still background, it's 30fps for full-screen repaints and 60fps for KDE 4.9
And that's actually the relavant case - ie. you'll usually have the panel with a static background, maybe some NastyNotification (tm) and constant screen updates somewhere else ("PointlessProcessPrintingPlasmoid" (c), "F***FancyFlashFrame" (c)) - ie. this test exposes that you actually will suffer from repainting fullscreen all the time.

How is the preformance here using the buffy copying variant? If (likely) much better, what should happen would be to more easily enter the pure buffer swapping render path (eg. whenever there's a constantly updating region OR (not! XOR) a fullscreen repaint and trigger the latter when the updated screen region is > 90% of the screen)


> But then I noticed there is complicated timer logic

=)

> in composite.cpp to somehow manually sync (?)
The purpose is to stay in event processing as long as possible and don't waste too much time on the GL code, esp. not waiting for the vertical sync.
What might happen here is that the granted padding is to small for your IGP and the buffer copying (thus you might loose a frame and eventually even block 16ms for it).

What you want to try for this is to raise the VBlankTime config option
kwriteconfig --file kwinrc --group Compositing --key VBlankTime n
the default for n is 6144 (about the slowest retrace i found in HW)

 and I may have broken it when I patched around, as I
> don't understand it - I call startRenderTimer directly after swapping
> buffers. I changed setCompositeTimer to always wait <=1ms (which may be
> complete nonsense)
you're passing a lot of time to the renderer and none to the event processing what might lead to laggy input handling.

> Should kwin avoid blocking in glXSwapBuffers, or is that code just an
> attempt to get working v-sync even without glXSwapBuffers?
the timer call is required as long as rendering does not happen in an extra thread.
It maintains framerate in the unsynced case and avoid long blocks in the synced one.
 
> is, so I just do my guesses and experiments, restricted to my own system. I
> am, however, using an IGP and therefore operating at the lower part of the
> GPU computational power spectrum, even though it's a quite recent IGP.
The HD Graphics are MUCH MUCH MUCH MORE powerful than the former GMA chips.

Comment 58 Thomas Lübking 2012-10-27 15:13:06 UTC

(In reply to comment #56)
> Doesn't a full-screen repaint do the same thing, i.e. copy the full window
> texture again and again for each frame?

Yes, but copying the buffer might not have the same performance as coyping a texture and - more important - it should not be the goal to turn the costs of a blinking cursor into those of a scrolling fullscreen browser.

> Debian testing is still at Mesa 8.something, and this is working fine.
> There's an error message that libEGL "failed to create a pipe screen",
afaics that means you get indirect rendering, thus no syncing. Virtual machine?

Comment 59 Martin Flöser 2012-10-27 15:24:00 UTC

> > Debian testing is still at Mesa 8.something, and this is working fine.
> > There's an error message that libEGL "failed to create a pipe screen",
> 
> afaics that means you get indirect rendering, thus no syncing. Virtual
> machine?
Nope, that warning seems normal. I get it on my Sandybridge (and Debian 
testing), too. But I do not get it on the radeon.

Comment 60 Ralf Jung 2012-10-28 11:52:40 UTC

Created attachment 74843 [details]
low-copy buffer swap

(In reply to comment #57)
> > don't understand it - I call startRenderTimer directly after swapping
> > buffers. I changed setCompositeTimer to always wait <=1ms (which may be
> > complete nonsense)
> you're passing a lot of time to the renderer and none to the event
> processing what might lead to laggy input handling.
Any feedback of the input handling is going through the renderer, so how could I notice a difference? I assume if kwin blocks in the buffer swap, the events are queued up, all processed at once when the frame is done, and then the next frame comes.
Even with the timer logic, I sometimes see almost 15msec spent in the renderer while being at 59-61 fps. Does that mean the sleep time is calculated too short?

(In reply to comment #58)
> Yes, but copying the buffer might not have the same performance as coyping a
> texture and - more important - it should not be the goal to turn the costs
> of a blinking cursor into those of a scrolling fullscreen browser.
You are right it feels really ugly to copy huge loads of data for a little blinking cursor. I don't know how much memory bandwidth an average system has today, but constantly wasting ~500 MByte/s (assume 4 bytes per pixel) for a still full-HD screen should not be necessary.
I made an attempt at a solution which copies only data which changed in the last frame, but not in the current - so when playing a full-screen video, no unnecessary copies are done, but when little changes on the screen, only small copies are necessary. I needed to copy some more (possibly for the same reason you needed the secondLastDamage in your patch), but attached patch seems to work:
* I get 60fps when playing a full-screen video
* I get 60fps when using the "Grid" Alt-Tab switcher in a still background
* Framerate of course still drops when the blurred background changes - is my GPU really not capable of blurring 60 frames a second?

The rectangles for indicating when a window snaps to a side of the screen are broken: They fill up to become completely white. This can be "fixed" by always doing a full-screen repaint, which of course has other issues. They might make assumptions about front- and backbuffer which no longer hold.
I have some commented-out code in the patch to perform a full-screen repaint for each large repaint, which however does not seem to make a performance difference. A possible (CPU-side) optimization might be to pass a NULL pointer (instead of a region) to the paintScreen function when a full-screen repaint is done - in this case, it could skip all the QRegion computations. That would need changes in all effects though.
It'd be interesting to see how this patch performs on other machines.

Comment 61 Ralf Jung 2012-10-30 17:28:29 UTC

Some notes on the EGL backend:
* vsync does not work as KWin requests buffer preservation. That means that SwapBuffers actually copies the back- to the frontbuffer, which is not only horribly inefficient but also not synchronized. After turning buffer preservation off, eglSwapInterval works as expected (with my drivers and hardware).
* There's an interesting extension: EGL_NOK_swap_region. It is not official, but implemented by Mesa. I implemented a patch using this extension. Unfortunately, the Mesa implementation does not synchronize this function. The official documentation for that extension was somewhere on symbian.org, but I can't find it anymore, so I don't know what the exact intention was. So, we are again stuck with SwapBuffers as the only reliable way to synchronize with the vblank, which means we must have a complete backbuffer for each frame.
I think the patch I wrote for the GLX backend works (in exactly the same manner) for EGL was well: My observations show that both use three buffers here. It could even be put into scene_opengl. However, like the Hanukkah patch, it relies on the driver actually re-using the backbuffers, and there is no good way for KWin to detect otherwise. If that's a reasonable assumption, the code could be greatly simplified as the only remaining backend-specific action is the actual {glX,egl}SwapBuffers call - except for the single-buffered GLX case.

What's the reason for the flush/swap happening in prepareRenderingFrame, i.e. one frame too late? Is that used for the timing calculation, that the SwapBuffers will be the very first thing done in Scene::paint, so we can sleep until right before it? I can of course change my patch to perform the back-to-front copying in endRenderingFrame, so flushBuffer starts with glXSwapBuffers. This doesn't prevent the constant yellow spikes (~12-16ms) in the fps diagram though.

Comment 62 Martin Flöser 2012-10-30 17:39:56 UTC

On Tuesday 30 October 2012 17:28:29 you wrote:
> https://bugs.kde.org/show_bug.cgi?id=307965
> 
> --- Comment #61 from Ralf Jung <post@ralfj.de> ---
> Some notes on the EGL backend:
> * vsync does not work as KWin requests buffer preservation. That means that
> SwapBuffers actually copies the back- to the frontbuffer, which is not only
> horribly inefficient but also not synchronized. After turning buffer
> preservation off, eglSwapInterval works as expected (with my drivers and
> hardware).
I'm fine with changing the buffer preservation. Not exactly sure why I used it 
at all - might be useful for embedded devices.
> * There's an interesting extension: EGL_NOK_swap_region.
never heard of this extension, in fact never heard of any _NOK_ extension.
> 
> What's the reason for the flush/swap happening in prepareRenderingFrame,
> i.e. one frame too late? Is that used for the timing calculation, that the
> SwapBuffers will be the very first thing done in Scene::paint, so we can
> sleep until right before it? I can of course change my patch to perform the
> back-to-front copying in endRenderingFrame, so flushBuffer starts with
> glXSwapBuffers. This doesn't prevent the constant yellow spikes (~12-16ms)
> in the fps diagram though.
it's mostly relevant for the GLX code with the waitSync, as it's an active 
wait. It would block the complete event processing causing the next frame to 
take too long

Comment 63 Ralf Jung 2012-10-30 18:00:34 UTC

(In reply to comment #62)
> > * vsync does not work as KWin requests buffer preservation. That means that
> > SwapBuffers actually copies the back- to the frontbuffer, which is not only
> > horribly inefficient but also not synchronized. After turning buffer
> > preservation off, eglSwapInterval works as expected (with my drivers and
> > hardware).
> I'm fine with changing the buffer preservation. Not exactly sure why I used
> it 
> at all - might be useful for embedded devices.
Currently, flushBuffer relies on it: It copies the bounding rect of the damaged region from the back- to the frontbuffer, which wouldn't work otherwise as the rest (non-damaged) pat of the backbuffer can have any old garbage in it. Also, if the PostSubBuffer extension is not available, flushBuffer always performs a SwapBuffer, which again relies on a completely intact backbuffer.

> it's mostly relevant for the GLX code with the waitSync, as it's an active 
> wait. It would block the complete event processing causing the next frame to 
> take too long
Isn't eglSwapBuffers (with working v-sync) an "active" wait as well, in the sense that it blocks?

Comment 64 Thomas Lübking 2012-10-30 18:31:11 UTC

(In reply to comment #63)

> Isn't eglSwapBuffers (with working v-sync) an "active" wait as well, in the
> sense that it blocks?

waitVideoSync will block unconditionally while glSwapInterval(1) glSwapBuffers() will block in case of double, but (should) not for triple buffering - so it depends on what EGL says about that part.

Comment 65 Ralf Jung 2012-11-04 12:10:18 UTC

What's the way to go forward here? I think the most interesting question is: Is it acceptable/a good idea to rely on the driver re-using the backbuffer? The extension https://www.opengl.org/registry/specs/EXT/glx_buffer_age.txt would actually allow KWin to know when a backbuffer is re-used, but it is not implemented in Mesa 8.

Looking at other compositors, if I understand the Clutter code properly it performs a full-screen redraw each time. By only adding the full screen to the damage area very late in KWin's paintSimpleScreen (after calling prePaintWindow), I was able to significantly boost the performance even with a full-screen blur region: It's ~45fps now. Maybe the blur effect can be further optimized.

Weston seems to rely on backbuffer re-usage. It does not copy front-buffer pixels to the backbuffer though, but instead keeps track of a damaged region for each backbuffer (which, however, is equivalent to remembering the last two or three damages) to know what to actually re-paint. The problem is that KWin has effect plugins, which can extend to the damaged region, so it is not straight forward to implement that here: We need a place where the full region that will be re-painted for the current frame is known, but nothing has been drawn yet so the region to be re-drawn can be extended arbitrarily. I implemented this locally, the observable behaviour is similar to the "copy from frontbuffer" patch - I am unsure which version is preferable, but extending the re-drawn area sounds cleaner to me (with my very limited GLX/OpenGL experience...).

Comment 66 Martin Flöser 2012-11-04 13:38:36 UTC

I think following Weston's approach is probably the best one as I assume that 
Kristian knows what is best with Mesa's drivers.

We should know the actual damage area after the effects have modified it, once 
it goes back into the scene. So it should be possible to keep a queue of the 
last frames damage areas.

Btw. we have quite some time to get it right as I think that's already too 
late for 4.10 (we are past soft feature freeze).

Comment 67 Ralf Jung 2012-11-04 13:46:32 UTC

(In reply to comment #66)
> I think following Weston's approach is probably the best one as I assume
> that 
> Kristian knows what is best with Mesa's drivers.
I will open a review request later today.

> We should know the actual damage area after the effects have modified it,
> once 
> it goes back into the scene. So it should be possible to keep a queue of the 
> last frames damage areas.
I think I found the right spot in paintSimpleScreen. If not, you'll probably complain during review ;-)

> Btw. we have quite some time to get it right as I think that's already too 
> late for 4.10 (we are past soft feature freeze).
So you think this is a feature? That's not for me to decide, of course.
Do you think a simpler patch which just fixes tearing for full-screen repaints could still go into 4.10? Something similar to Thomas' first patch (https://git.reviewboard.kde.org/r/106833/).

Comment 68 Martin Flöser 2012-11-04 13:55:03 UTC

(In reply to comment #67)
> > Btw. we have quite some time to get it right as I think that's already too 
> > late for 4.10 (we are past soft feature freeze).
> So you think this is a feature?
It's not really a feature, neither really a bug fix. It changes the way how we render and it could introduce regressions for non-Mesa drivers, which is why I might consider it as a "feature". In the end it depends on how the code looks like.

And "tear-free" is a feature, isn't it ;-)

Comment 69 Thomas Lübking 2012-11-04 14:37:56 UTC

(In reply to comment #68)
> we render and it could introduce regressions for non-Mesa drivers

FTR: the nvidia blob is the one driver for which glWaitVideoSync is broken ever since.

With the slight refactor of the Hanukka patch it should be possible to preserve the old behavior and introduce the buffer copying as option (even secretly, ie. w/o GUI ;-)

@Ralf
The major aspect in your patch is the "inverted" damage calculation, correct?

Comment 70 Ralf Jung 2012-11-04 14:50:47 UTC

"Has 1 bug less" is also a feature... ;-)

But I agree it needs thorough testing. I'll split the patch into two, one which I consider safe, and one with the actual "always page-flip" magic. You can find the review request at https://git.reviewboard.kde.org/r/107194/ .

(In reply to comment #69)
> @Ralf
> The major aspect in your patch is the "inverted" damage calculation, correct?
In the one I posted here, right. That was necessary to get 60fps during video playback. I think the problem was that your patch always copied loads of stuff back after the swapping, even if most of that would be overwritten by the next frame. That's avoided by my inverted logic.
The patch I submitted for review however works completely different, it makes sure the actual rendering is done on the slightly larger area to fix the current backbuffer. That's the approach taken by Weston as well. It kind of replaces pixel copying by texture drawing - I don't know which is faster ;-) . It also tries to inject the additional damage late in the drawing process, so that caching effects (like the blur effect) only get the "real" damage.

Comment 71 RussianNeuroMancer 2012-11-04 14:56:06 UTC

> But I agree it needs thorough testing. 
So maybe implement this optionally, like Thomas propose? It exactly allow to do thorough testing.

Comment 72 Ralf Jung 2012-11-04 15:05:34 UTC

(In reply to comment #71)
> > But I agree it needs thorough testing. 
> So maybe implement this optionally, like Thomas propose? It exactly allow to
> do thorough testing.
Sure, if the approach is accepted, one could make it optional.

Comment 73 Thomas Lübking 2012-11-04 18:10:44 UTC

https://git.reviewboard.kde.org/r/107198/
patch optionally preserving current behavior and providing the back-copying one in addition.

This obsoletes the Hanukkah patch as well as the low-copy buffer swapping one (as it's essentially a merge of them)

Comment 74 valdikss 2012-11-04 18:18:44 UTC

Thank you so much! You did a big work. Thanks again.

Comment 75 John van Spaandonk 2012-11-13 15:18:38 UTC

Works for me after upgrading to kubuntu 4.9.3 via backports PPA
Devels, thx for the effort, from a happy user!
Not sure if I should set this to resolved now.

Comment 76 Ralf Jung 2012-11-13 15:57:00 UTC

On my system, nothing at all changed from 4.9.2 to 4.9.3.

Comment 77 Thomas Lübking 2012-11-13 16:38:25 UTC

Ubuntu might have either injected one of the patches or the copySubBuffer / waitVideoSync / glSyncInterval situation might have been altered in the driver (in case KDE wasn't the very only thing upgraded)

Comment 78 root 2012-11-13 18:12:35 UTC

Unfortunately, the mentioned backports PPA does not contain KDE 4.9.3 for 12.10 (quantal), yet.
So I can not test it.
Ubuntu changelog does not mention any patches related to KWin, though.

Comment 79 Martin Flöser 2012-11-13 18:28:55 UTC

> Ubuntu changelog does not mention any patches related to KWin, though.
that's not surprising as the KDE SC 4.9.3 changelog doesn't mention this 
change either. The Kubuntu developers are probably not aware of it.

Comment 80 RussianNeuroMancer 2012-11-13 18:55:56 UTC

> Unfortunately, the mentioned backports PPA does not contain KDE 4.9.3 for 12.10 (quantal), yet.
https://launchpad.net/~kubuntu-ppa/+archive/ppa?field.series_filter=quantal

Comment 81 John van Spaandonk 2012-11-13 19:03:33 UTC

On 11/13/2012 07:55 PM, RunetMember wrote:
> https://bugs.kde.org/show_bug.cgi?id=307965
>
> --- Comment #80 from RunetMember <runetmember@gmail.com> ---
>> Unfortunately, the mentioned backports PPA does not contain KDE 4.9.3 for 12.10 (quantal), yet.
> https://launchpad.net/~kubuntu-ppa/+archive/ppa?field.series_filter=quantal
>
I used this.
deb http://ppa.launchpad.net/kubuntu-ppa/ppa/ubuntu quantal main

(Sorry, it's the kubuntu ppa) http://www.kubuntu.org/news/kde-sc-4.9.3

Comment 82 root 2012-11-13 19:49:59 UTC

Ah, I must have somehow gotten into the wrong repo, there were only three packages in there.

Now I can confirm that this bug is fixed for me on Intel i5 2500k, HD 3000, intel_drv.so 2.20.9, Ubuntu 12.10+Kubuntu backports PPA

/usr/lib/xorg/modules/drivers/intel_drv.so belongs to xserver-xorg-video-intel, not the KDE backport.

Thank You!
To whomever :D

Comment 83 valdikss 2012-11-13 21:05:49 UTC

Don't see any kwin changes
http://bazaar.launchpad.net/~kubuntu-packagers/kubuntu-packaging/kde-workspace/changes

Comment 84 Ralf Jung 2012-11-13 21:07:33 UTC

Maybe Ubuntu enabled the "TearFree" option in the Intel X driver?
That'd be just a crude work-around though, that option adds a driver-side compositor (even if the desktop already uses a compositor), which means a lot of additional copies.

Comment 85 valdikss 2012-11-13 21:12:45 UTC

(In reply to comment #84)
> Maybe Ubuntu enabled the "TearFree" option in the Intel X driver?
> That'd be just a crude work-around though, that option adds a driver-side
> compositor (even if the desktop already uses a compositor), which means a
> lot of additional copies.

sudo grep -i tear /var/log/Xorg.0.log
Please

Comment 86 root 2012-11-13 21:15:54 UTC

(In reply to comment #85)
> sudo grep -i tear /var/log/Xorg.0.log
> Please

Nothing for me, but I didn't restart X nor reload the driver, but no tearing; just updated KDE.

Comment 87 Martin Flöser 2012-11-13 21:24:51 UTC

> Don't see any kwin changes
> http://bazaar.launchpad.net/~kubuntu-packagers/kubuntu-packaging/kde-workspa
> ce/changes
Rev 699: New upstream release (LP: #1074747)

Comment 88 valdikss 2012-11-13 21:36:39 UTC

(In reply to comment #87)
> Rev 699: New upstream release (LP: #1074747)

files modified:
debian/changelog
debian/control

Comment 89 root 2012-11-16 09:12:41 UTC

In order to clarify: there is still tearing in the upper part!
Just far far far less.

Comment 90 John van Spaandonk 2012-11-16 16:33:04 UTC

On 11/16/2012 10:12 AM, root@jurathek.de wrote:
> https://bugs.kde.org/show_bug.cgi?id=307965
>
> --- Comment #89 from root@jurathek.de ---
> In order to clarify: there is still tearing in the upper part!
> Just far far far less.
>
I concur.

Tearing now only occurs with the top few mm, becoming difficult to see.
Thanks for the fix, but no home run quite yet :-(

Comment 91 valdikss 2012-11-16 16:40:20 UTC

It could be that RC6 is disabled. If you enable RC6, it would occur just a little lower, but it would be pretty visible.

Comment 92 Thomas Lübking 2012-11-16 17:23:24 UTC

There has been no fix from our side unless you patched in https://git.reviewboard.kde.org/r/107198/ and selected a full repaint variant 

   kwriteconfig --file kwinrc --group Compositing --key <?>

<?> being either of n,e,p,c for "No fullrepaints", "Extend (nearly full) to full repaints", "full rePaints" and "Copy buffer full repaints".

"c" is only reasonable on the nvidia blob and eventually fglrx.
Do NOT use it with MESA and the Open Source drivers.

Comment 93 Andrei Borisochkin 2012-11-22 07:39:14 UTC

I can confirm this behavior on up to date Archlinux (KDE 4.9.3, nvidia blob 310.19, video card - gtx650 ti). This tearing shows up regardless of vsync in nvidia-settings. When I turn on vsync in kwin tearing will come up. However, NOT immediately - after a few seconds.

I didn't encounter this behavior with older card (7900GS) the day before yesterday (nvidia blob 304.xx).

Comment 94 Andrei Borisochkin 2012-12-06 04:53:36 UTC

https://devtalk.nvidia.com/default/topic/525074/linux/image-tear-with-kwin-compositing-and-vsync-on/

"glWaitVideoSync() works as intended, but doesn't really provide a way to present in a tear-free way. The proper way to fix this on the KWin side would be to implement support for the new GLX_EXT_buffer_age extension."

Comment 95 Thomas Lübking 2012-12-06 14:30:52 UTC

Whatever was the intention of "glWaitVideoSync" (apparently not the way it was utilized in compositors) - "glxinfo | grep GLX_EXT_buffer_age" is void on nvidia 310.19 and as reported in the review request on intel/mesa as well.
So is "grep GLX_EXT_buffer_age /usr/share/doc/nvidia/NVIDIA_Changelog".

That is why we attempt always full repaints and swapping the buffer.
https://git.reviewboard.kde.org/r/107198/

Comment 96 Pierre-Loup A. Griffais 2012-12-06 23:40:47 UTC

Hi Thomas,

The problem with glWaitVideoSync() is that it waits on the CPU, so any interval of time can happen between performing that wait and issuing the blit, which means tear-free presentation isn't guaranteed.

We briefly discussed this with Fredrik a few days ago, and contrary to what I said GLX_EXT_buffer_age isn't yet exposed on current 310 drivers (as you found out). However we should be rolling out a new release series shortly, which will include support for it.

The intention behind the change you linked is good, but without the functionality provided by  GLX_EXT_buffer_age you have no guarantees about how many buffers are in the flip queue of the driver; it's usually a trivial cycle between 2 or 3 buffers depending whether Triple Buffering is enabled, but other implementations might have their own specific flipping mechanism.

Thanks,
 - Pierre-Loup

Comment 97 Brandon Watkins 2012-12-07 18:40:10 UTC

Does anyone know if a fix for this (for intel cards) ever make it into kde 4.9.x? Or will it only come later on in 4.10? I really want to switch to KDE but this is a dealbreaker for me :(.

Comment 98 Thomas Lübking 2012-12-07 19:03:59 UTC

(In reply to comment #97)
> Does anyone know if a fix for this (for intel cards) ever make it into kde
> 4.9.x?

Actually the intel IGPs should operate on the glXCopySubBuffer path and that should not cause tearing but with some broken MESA versions.

The pending patch at https://git.reviewboard.kde.org/r/107198/ will *hopefully* still make it into 4.10 - it's however a far too massive change for a bugfix release (ie. 4.9) sorry.

Comment 99 Thomas Lübking 2012-12-07 19:15:22 UTC

Hi Pierre-Loup

(In reply to comment #96)
> The problem with glWaitVideoSync() is that it waits on the CPU
Ok, thanks for the information - i guess it's no misassumption thzat we could in a further patch completely scratch that path then?

> The intention behind the change you linked is good, but without the
> functionality provided by  GLX_EXT_buffer_age you have no guarantees 

Actually the current approach either forces full repaints or completes the backbuffer from the frontbuffer (we scratched re-using the backbuffer for it's undefined state, plus it will require some (but minor) changes on clipping artificial elements), but the latter has severe performance issues on the MESA stack (so we'll mostly require GLX_EXT_buffer_age there, esp. since the weaker IGPs also typically suffer most from effects like blurring etc.)

Many thanks for your assistance on the issue.

Comment 100 Brandon Watkins 2012-12-07 19:49:04 UTC

(In reply to comment #98)
> (In reply to comment #97)
> > Does anyone know if a fix for this (for intel cards) ever make it into kde
> > 4.9.x?
> 
> Actually the intel IGPs should operate on the glXCopySubBuffer path and that
> should not cause tearing but with some broken MESA versions.
> 
> The pending patch at https://git.reviewboard.kde.org/r/107198/ will
> *hopefully* still make it into 4.10 - it's however a far too massive change
> for a bugfix release (ie. 4.9) sorry.

I'm on ivybridge and I had tried kubuntu 12.10 which has pretty recent mesa/kernel/driver versions (9.0, 3.5, 2.20.9).

AFIAK the problem is that currently sandybridge and ivybridge hardware simply cannot get fully tear free output unless the compositor *always* pageflips (ubuntu has changed compiz to behave this way by default in ubuntu 12.10 and I get absolutely no tearing anywhere there). I can also achieve totally tear-free output in gnome-shell by adding CLUTTER_PAINT=disable-clipped-redraws:disable-culling to /etc/environment.

Intel is working on adding "legacy" vsync in kernel 3.8 afiak, but if this is used it will cause significantly more power usage, since it keeps the intel card out of its power saving state, so the proper way on intel will still be to only use page-flipping, which is why that pending patch you linked is probably the only thing that will gives sandybridge/ivybridge users totally tear-free output.

It is possible to get tear-free fullscreen video currently if you use unredirect fullscreen windows in kwin + opengl output in the video player, but even that doesn't always work (it depends on the application, some video players don't seem to page-flip even with opengl output, for example see this vlc bug: https://trac.videolan.org/vlc/ticket/7702

the best tear free video experience for ivybridge is to use an always page-flipping compositor, with unredirect fullscreen windows disabled, this seems to get rid of any and all tearing.

Comment 101 Brandon Watkins 2013-02-06 15:39:27 UTC

I see KDE 4.10 is out today, can anyone tell me if tearing still occurs on ivybridge in kde 4.10?

Comment 102 Thomas Lübking 2013-02-06 17:25:39 UTC

None of the discussed "more buffer swapping" patches has been applied to kde 4.10

Comment 103 John van Spaandonk 2013-02-25 20:37:41 UTC

still present in kde 4.10 on kubuntu 12.10, as expected.

Comment 104 Brandon Watkins 2013-02-26 21:50:29 UTC

Can we expect it to make it into kde 4.11 then? I'm currently using the intel xorg "tearfree" option and have kwins vsync disabled, which works but has poor performance and higher power draw. This is a really big usability issue that makes kwin rather unusable on intel graphics!

Comment 105 valdikss 2013-02-26 21:53:19 UTC

Yes, it will be included in 4.11

Comment 106 Thomas Lübking 2013-03-05 19:19:28 UTC

Git commit 6072b4feb8c90024aa24b2e9cb8a21ab2140412c by Thomas Lübking.
Committed on 18/02/2013 at 23:17.
Pushed by luebking into branch 'master'.

support a permanent glSwapBuffer

either by
- forcing fullrepaints unconditionally
- turning a repaint to a full one beyond a threshhold
- completing the the backbuffer from the frontbuffer after the paint
FIXED-IN: 4.10
REVIEW: 107198

M  +1    -11   kwin/composite.cpp
M  +5    -4    kwin/eglonxbackend.cpp
M  +60   -57   kwin/glxbackend.cpp
M  +29   -0    kwin/options.cpp
M  +13   -0    kwin/options.h
M  +35   -10   kwin/scene.cpp
M  +3    -0    kwin/scene.h
M  +76   -1    kwin/scene_opengl.cpp
M  +1    -0    kwin/scene_opengl.h

http://commits.kde.org/kde-workspace/6072b4feb8c90024aa24b2e9cb8a21ab2140412c

Comment 107 Mark 2013-03-09 15:16:44 UTC

(In reply to comment #106)
> Git commit 6072b4feb8c90024aa24b2e9cb8a21ab2140412c by Thomas Lübking.
> Committed on 18/02/2013 at 23:17.
> Pushed by luebking into branch 'master'.
> 
> support a permanent glSwapBuffer
> 
> either by
> - forcing fullrepaints unconditionally
> - turning a repaint to a full one beyond a threshhold
> - completing the the backbuffer from the frontbuffer after the paint
> FIXED-IN: 4.10
> REVIEW: 107198
> 
> M  +1    -11   kwin/composite.cpp
> M  +5    -4    kwin/eglonxbackend.cpp
> M  +60   -57   kwin/glxbackend.cpp
> M  +29   -0    kwin/options.cpp
> M  +13   -0    kwin/options.h
> M  +35   -10   kwin/scene.cpp
> M  +3    -0    kwin/scene.h
> M  +76   -1    kwin/scene_opengl.cpp
> M  +1    -0    kwin/scene_opengl.h
> 
> http://commits.kde.org/kde-workspace/6072b4feb8c90024aa24b2e9cb8a21ab2140412c

Hi,

The "version fixed in" field in this bug report notes 4.11.
Yet the actual commit message says "FIXED-IN: 4.10"

...

Which version will have this fix? the just released 4.10.1?

Cheers,
MArk

Comment 108 Ralf Jung 2013-03-09 15:46:42 UTC

The fix will land in 4.11. The danger of regressions is too big to put it into the stable branch.

Comment 109 Thomas Lübking 2013-03-09 15:48:22 UTC

4.11
Sorry, my bad.
The commit took some time, initially pointed 4.10 but didn't make it and i forgot to edit the message.

The patch (and the pending following ones) are far to invasive to be added to minor releases.

Comment 110 Mark 2013-03-09 15:52:47 UTC

(In reply to comment #109)
> 4.11
> Sorry, my bad.
> The commit took some time, initially pointed 4.10 but didn't make it and i
> forgot to edit the message.
> 
> The patch (and the pending following ones) are far to invasive to be added
> to minor releases.

Ok, that's fine. Just wanted to make sure since i'm also experiencing the issue described in this bug.

Comment 111 Andreas Hermann 2013-03-28 22:54:08 UTC

I applied this patch against kwin 4.10.1. My system has an nvidia card, using the binary blob.
My results:

- with 'c' mode:  zero tearing, but kwin constantly has 60% cpu usage. Performance is ok, but i think this is because of the powerfull gpu/cpu (gtx460 + core i7)

- with 'e' mode: amazing performance, but again tearing on the second monitor (upper part)

- with 'p' Mode: same as c

However, tearing generally disappears if i force my two monitors to have the same resolution.

Comment 112 Thomas Lübking 2013-03-28 23:17:41 UTC

Pretty much fixed tearing on "c" will remain until we remove the "rather not so helpful" waitSync calls from the pixel copying branch (patch here, RR pending for the moment)

As for the CPU usage: the patch was never tested against 4.10 neither would it cleanly apply.
If you can provide a callgrind dump, i can look into it, but there's little reason for CPU overhead on the buffer swapping paths (memory throughput esp. w/o flipping support, but that's not CPU related)

It rather sounds as if you run into constant repaints - for whatever reason - and for 60% load on an i7 core rather unblocked (ie. beyond 60FPS)

Comment 113 Andreas Hermann 2013-03-29 13:21:22 UTC

Thank yout Thomas for the quick response.
So there is no easy way to to test this on 4.10?

Comment 114 Thomas Lübking 2013-03-29 13:28:11 UTC

No. It would require a serious and complete backport to ensure that everything is in shape (ie. every behaviour that patch relies on is present)
And that'd be quite some work for a quick singleton test.

The most simple way would actually be to get the git master of kde-workspace and compile & install only kwin (that should actually work)

Comment 115 Thomas Lübking 2013-03-30 17:43:42 UTC

(In reply to comment #113)
> Thank yout Thomas for the quick response.
> So there is no easy way to to test this on 4.10?

Just a short re-confirmation: you did not get this CPU load while using the FPS counter, did you?
That's pretty much expectable and unrelated.

Comment 116 Andreas Hermann 2013-03-31 10:39:29 UTC

(In reply to comment #115)
> (In reply to comment #113)
> > Thank yout Thomas for the quick response.
> > So there is no easy way to to test this on 4.10?
> 
> Just a short re-confirmation: you did not get this CPU load while using the
> FPS counter, did you?
> That's pretty much expectable and unrelated.

Do you mean the Show FPS effect? I dit not use that. I just looked at the process list with htop. That showed kwin constantly using around 50-60% cpu usage when running with 'c'.
As soon as i have some time, i will try it again with kwin from git-master.

Comment 117 Selopo 2013-09-14 11:51:39 UTC

I'm glad to see that the problem is fixed in 4.11. I upgraded Mint to it and works like a charm (note that Vsync settings have to be changed to Full scene repaints). I can finally watch movies without tearing, this was a no-go for me to continue with KDE.

Comment 118 John van Spaandonk 2013-10-19 13:56:37 UTC

works with the new kde, 4.11.2.

aboris
adikurthy
anhermann
ansla80
antonis+kdebugs
brian.m.hill
bwat47
elena
felixonmars
funeral20
gspr
kde
loic.yhuel
lucidlytwisted
markg85
pgriffais
post
root
russianneuromancer
selopo
solerman
tangjinchuan
ua_bugz_kde
valdikss