445412 – Zoom effect causes screen to stop updating

Bug 445412 - Zoom effect causes screen to stop updating

Summary: Zoom effect causes screen to stop updating

Status:	RESOLVED FIXED

Alias:	None

Product:	kwin
Classification:	Plasma
Component:	wayland-generic (other bugs)
Version First Reported In:	git master
Platform:	Other Linux

Importance:	VHI normal
Target Milestone:	---
Assignee:	KWin default assignee

URL:
Keywords:	regression

Duplicates (1):	446054 (view as bug list)
Depends on:
Blocks:

Reported:	2021-11-13 06:21 UTC by Nate Graham
Modified:	2021-12-03 08:33 UTC (History)
CC List:	3 users (show)

See Also:
Latest Commit:	https://invent.kde.org/plasma/kwin/commit/2b628ea412442f8628b6b3e3b6e754f653d83792
Version Fixed In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Nate Graham 2021-11-13 06:21:16 UTC

This is a recent regression in the Plasma Wayland session (have not tried it on X11 yet) with everything KDE from git master.

Pressing the Meta+plus shortcut to trigger the zoom effect makes the entire system hang. Input is no longer accepted. A hard reboot is necessary.

Comment 1 Nate Graham 2021-11-13 06:56:12 UTC

Additional information:

Input is in fact accepted, it's just that the screen contents do not change. I can use the keyboard to open KRunner and run `kwin_wayland --replace` to recover without rebooting the machine.

VT switching works. When I VT switch and come back to the graphical session, KWin will display the image zoomed in. But thereafter, the screen contents will not change anymore. The screen's refresh rate appears to be tied to the frequency of VT switching. :)

Comment 2 Nate Graham 2021-11-14 14:47:58 UTC

Works fine on X11.

Comment 3 Duncan 2021-11-16 02:45:59 UTC

Seeing this myself (also on live-git, here from the gentoo/kde overlay live-git packages) and was just searching bugzi to see if the bug had been already filed before filing it myself.  =:^)  Highly frustrating as I use zoom often enough it's instinctual, and find myself forgetting I can't ATM because its broken!  (I'm wayland-only but for xwayland, having uninstalled xorg, tho I'm comfortable at the CLI and have weston installed as a backup wayland compositor if plasma/kwin_wayland live-git gets too broken.)

Additional information on dual monitor behavior:

On zoom, one monitor seems to continue working normally, while the other freezes and is subject to VT-switching-locked refresh-rate, as you put it. =:^)

My keyboard has three keys I have dedicated to zoom, in/out/actual.  After zooming in, I can zoom out, and *often* the frozen monitor comes back to life and starts refreshing normally.  However, the actual-size key is broken and won't return me to normal/actual, even on the still-refreshing monitor -- I have to use the zoom-out key.

I haven't fully qualified when I can get the second monitor back working and when not, but it seems that if I move the pointer to the frozen-while-zoomed monitor, it stays frozen.  Sometimes the cursor disappears and I can't get it back even moving back over the otherwise working monitor, other times it stays visible on the otherwise frozen monitor and the cursor changes if I move it over something where it would normally change, even with the rest of that monitor (including a conky sysmon I'd normally see update every second) stays frozen.

Since xorg isn't installed I can't test behavior there, tho you already did.  But I'm considering trying one of the other zoom effects (I'm on full-display zoom ATM) to see if it's broken too -- guessing it is.

Comment 4 Duncan 2021-11-16 03:10:54 UTC

(In reply to Duncan from comment #3)
> I'm considering trying one of the other zoom effects (I'm on
> full-display zoom ATM) to see if it's broken too -- guessing it is.

Can't get looking glass effect to work at all (not sure whether it was before the recent changes or not).

Magnifier works but is bugged with an apparently unrelated bug that could have been there all along on wayland (three rectangles instead of one, center is the expected size but outline-only, unmagnified inside the outline, with two half-size rectangles splitting the actual magnified area to the left and right of the center rectangle).  Guess I'll see about filing one on that after this bug gets resolved, if it's still there.

But with looking-glass apparently broken entirely (not triggering at all), looks like I can set that to disable zoom so I don't keep forgetting its broken and end up disabling my desktop until I kwin_wayland --replace.  =:^)

Comment 5 Duncan 2021-11-16 05:55:08 UTC

Attempting bisect, regressed kwin back to (commit) bad575211, committed Nov 10, which kills eglstream, and zoom is still broken.  f91ae3e97 is the immediately previous commit and won't build with an error about missing eglstream.h (or similar) in kwayland-server, so regressing kwin to before bad575211 seems to require regressing kwayland-server back to when it supported eglstream as well.  Not sure whether I'll try regressing both together any further, but that should eliminate /some/ possibilities.

(FWIW I'm on amdgpu so eglstream itself, being nvidia, shouldn't directly affect me at all.  Glad to be rid of its complications in kwin, but the removal is complicating this bisect.)

Comment 6 Duncan 2021-11-16 06:26:59 UTC

Bisected both and got a working hit:
* kwayland-server back to 6cc372683, immediately b4 eglstreams removal
* kwin back to efa08b1f3, before scenes move.

That works!  So assuming the breakage is in kwin not kwayland-server, we're looking at the range efa08b1f3 (Nov 8) to bad575211 (Nov 10).  Looks like ~36 commits.

Comment 7 Duncan 2021-11-16 07:45:33 UTC

Bisect results, kwin on top of kwayland-server 6cc372683 to get eglstreams back:

e2a086384 works, a07aae828 broken and thus the culprit!

* commit a07aae828
| Author:     Xaver Hugl <xaver.hugl@gmail.com>
| AuthorDate: Fri Oct 8 10:52:01 2021 +0200
| Commit:     Xaver Hugl <xaver.hugl@gmail.com>
| CommitDate: Tue Nov 9 22:15:31 2021 +0100
|
|     platforms/drm: delay presentation for modesets
|
|     Currently KWin is combining modesets with presentation, which causes problems
|     when multiple monitors are used and crtcs need to be switched around, because
|     taking away a CRTC from another output causes the driver to disable the
|     other output. In order to avoid such problems, delay presentation until
|     all pipelines are ready to present and then do a modeset with a single atomic
|     commit. To process the resulting page flip events properly this commit also
|     ports KWin to page_flip_handler2 and changes how the pageFlipped and
|     notifyFrameFailed signals are processed.

Comment 8 Duncan 2021-11-16 08:05:24 UTC

Unfortunately a07aae828 won't direct-revert on top of current, so I can't easily double-check the bisect result on current.  Not being a dev, with the size of the patch and patch reporting multiple failing chunks, I'm afraid attempting to do it manually is out of my league.  But at least we have a culprit commit to work with, now, which is dramatic progress.

Comment 9 Zamundaaa 2021-11-16 09:15:49 UTC

Thank you for bisecting, that's always a bit of a hassle with the split between kwin and kwayland-server. Do always put the email with the offending commit into CC though so that the author gets notified directly.

From my wayland-session.log when reproducing this it seems like buffer swap is failing with EGL_BAD_SURFACE, but why it does that I can't explain yet. The only cause for the error I can see in Mesa is when gbm_surface_lock_front_buffer doesn't get called after eglSwapBuffers - but it is getting called every time. The gbm surface has a free buffer every time we call eglSwapBuffers, too, and KWin isn't even doing a modeset when the effect gets activated, or creating new test buffers or anything else that I think could be caused by the commit directly.

Comment 10 Duncan 2021-11-26 01:32:00 UTC

(In reply to Zamundaaa from comment #9)
> From my wayland-session.log when reproducing this it seems like buffer swap
> is failing with EGL_BAD_SURFACE, but why it does that I can't explain yet.
> The only cause for the error I can see in Mesa is when
> gbm_surface_lock_front_buffer doesn't get called after eglSwapBuffers - but
> it is getting called every time. The gbm surface has a free buffer every
> time we call eglSwapBuffers, too, and KWin isn't even doing a modeset when
> the effect gets activated, or creating new test buffers or anything else
> that I think could be caused by the commit directly.

Consider the non-duplicated multi-monitor/ctrc case.

Some time ago now kwin_wayland split the formerly monolithic global surface into separate surfaces per ctrc/monitor, and that has been working.

But when that first happened it broke zoom with somewhat similar but not as bad behavior (it was possible to trigger updating again with that bug) until a later commit fixed it.  That was bug #429377, fixed back in February by
https://invent.kde.org/plasma/kwin/commit/523ad8e25c34eb0e683f6e29ad15c3b9a7cdad31
I suspect the same faulty assumption is behind both bugs.

Keep in mind that once zoomed, part of the desktop will appear on a different monitor than at 100% and presumably need to be drawn to a different ctrc.  If the code assumption is it's on the same one that would explain the bad surface errors, correct?

Nate's original report doesn't mention multi-monitor, tho, so I'm assuming this bug's occurring on single-monitor as well, and I'm not at all sure this explains the single-monitor case, unless the zoom triggers bad-surface for now-off-screen area as well, not just on-other-screen area.

Meanwhile, admin's intuition (as I'm not a dev) says the problem is in the page-flip processing changes mentioned in the commit message as fixups to the modeset/delay-presentation changes that seems to have been the main thrust of the commit.  That'd be why it's happening in the absence of the modesets that the culprit commit was "about".

Comment 11 Bug Janitor Service 2021-11-29 10:51:48 UTC

A possibly relevant merge request was started @ https://invent.kde.org/plasma/kwin/-/merge_requests/1726

Comment 12 Vlad Zahorodnii 2021-11-29 13:09:22 UTC

Git commit 1f318a2245f9f887a2bf8aa320dc905a012842df by Vlad Zahorodnii.
Committed on 29/11/2021 at 12:48.
Pushed by vladz into branch 'master'.

effects/zoom: Rework how cursor texture is managed

Update the cursor texture on demand to avoid changing the current opengl
context in the middle of compositing cycle.

M  +42   -43   src/effects/zoom/zoom.cpp
M  +5    -5    src/effects/zoom/zoom.h

https://invent.kde.org/plasma/kwin/commit/1f318a2245f9f887a2bf8aa320dc905a012842df

Comment 13 Josep Febrer 2021-11-29 23:36:25 UTC

*** Bug 446054 has been marked as a duplicate of this bug. ***

Comment 14 Duncan 2021-11-30 08:30:10 UTC

(In reply to Vlad Zahorodnii from comment #12)
> Git commit 1f318a2245f9f887a2bf8aa320dc905a012842df by Vlad Zahorodnii.
> Committed on 29/11/2021 at 12:48.
> Pushed by vladz into branch 'master'.
> 
> effects/zoom: Rework how cursor texture is managed

Confirming that fixes it (as well as older and less severe bug #429377, which as bug filer I just set RESOLVED/FIXED) here.

I'll let Nate or Vlad do the honors here.

Comment 15 Duncan 2021-11-30 12:42:49 UTC

BTW, whatever optimizations you've done to kwin_wayland since the a07 commit above, have had a *BIG* positive effect.  While my decade-old AMD bulldozer-1 fx6100 (2011) with half-decade-old (2016) Radeon rx460 graphics could (with some stress) do 4k60 in vlc, in firefox on youtube not so much -- I could get 4k50 with some stuttering, but @4k60, the image would freeze on many videos and often not come back until I hit pause momentarily to let it resync.

With the recent optimizations I'm now doing 4k60 with some stuttering, same firefox, about the same stuttering as 4k50 before, so on my system the optimizations have been good for ~20%!

FWIW this is the video I've been using for testing, a 5 minute 4k60 Costa Rica nature video: https://youtu.be/LXb3EKWsInQ  Again, while I still get some stutter, for the first time I could play it clear through without having to pause the video to resync -- it will still freeze occasionally now but much less and now it always comes back without me having to touch pause, where before it'd stay frozen until I paused to resync, and it'd do that repeatedly, making it effectively unplayable at the full 4k resolution unless I slowed it down.

So it's a BIG difference! =:^)

Comment 16 Nate Graham 2021-11-30 19:12:34 UTC

Confirmed fixed! Thanks a lot!

Comment 17 Vlad Zahorodnii 2021-12-03 08:33:20 UTC

Git commit 2b628ea412442f8628b6b3e3b6e754f653d83792 by Vlad Zahorodnii.
Committed on 03/12/2021 at 08:06.
Pushed by vladz into branch 'master'.

backends/drm: Mark frame failed if presenting null buffer

If eglSwapBuffers() fails, there won't be a buffer and so we need to
mark the frame as failed. Otherwise, the screen can be frozen.

eglSwapBuffers() can fail if some effect calls makeOpenGLContext()
between RenderBackend::beginFrame() and RenderBackend::endFrame(), which
is the case with the zoom effect. It can set wrong draw surface in
ZoomEffect::recreateTexture()

M  +1    -0    src/backends/drm/drm_output.cpp
M  +1    -6    src/backends/drm/drm_pipeline.cpp

https://invent.kde.org/plasma/kwin/commit/2b628ea412442f8628b6b3e3b6e754f653d83792