Bug 443341 - Flickering and hanging on Nvidia starting at kwin_x11 5.22.90
Summary: Flickering and hanging on Nvidia starting at kwin_x11 5.22.90
Status: RESOLVED DUPLICATE of bug 443951
Alias: None
Product: kwin
Classification: Plasma
Component: compositing (show other bugs)
Version: 5.22.90
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2021-10-05 08:33 UTC by nyanpasu64
Modified: 2021-10-25 23:25 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Screenshot of a partly black screen, taken while the screen was flickering. (65.42 KB, image/png)
2021-10-24 00:14 UTC, nyanpasu64
Details

Note You need to log in before you can comment on or make changes to this bug.
Description nyanpasu64 2021-10-05 08:33:54 UTC
SUMMARY
Around the time I upgraded to Plasma beta 5.22.90 (and nvidia 470.74), KWin become substantially more unstable. Opening some apps caused black flickering on most of the screen, and the screen froze when re-enabling compositing after some time spent with it off.

STEPS TO REPRODUCE
For screen hang:

1. Keep compositing at OpenGL 2.0.
2. Turn off compositing using alt+shift+f12 or the equivalent shortcut. I find kwin much more stable in this state.
3. Use the computer for some time.
4. Enable compositing using the keyboard shortcut.

OBSERVED RESULT
Screen stops updating except for cursor. If I switch to TTY and back, screen is black except for cursor. I can still interact with windows despite screen not redrawing, and restarting kwin_x11 (from Konsole or TTY) fixes the issue. (I'm using systemd user boot for Plasma, so that's systemctl --user restart plasma-kwin_x11.)

Full-screen flickering is more random, and is triggered by launching certain application windows (generally GPU-accelerated ones like QML-based System Settings, or gtk4-demo's "Automatic Scrolling"). If you're unlucky, the rest of the screen will flash between the intended contents and black on every 2 or 3 frames, until the window is closed. (IIRC dragging the window around stopped flickering around wherever you dragged the window.) Reopening the window will generally trigger the bug again. Restarting kwin_x11 will prevent the bug from happening even with the same apps, for some time.

Sometimes KDE will turn off compositing because it detects OpenGL-related crashes.

EXPECTED RESULT
Compositing works.

SOFTWARE/OS VERSIONS
Operating System: Arch Linux
KDE Plasma Version: 5.22.90
KDE Frameworks Version: 5.86.0
Qt Version: 5.15.2
Kernel Version: 5.14.8-zen1-1-zen (64-bit)
Graphics Platform: X11
Processors: 12 × AMD Ryzen 5 5600X 6-Core Processor
Memory: 15.6 GiB of RAM
Graphics Processor: NVIDIA GeForce GT 730/PCIe/SSE2

pacman says I installed nvidia 470.74 around the same time as kwin 5.22.90. I'm not sure which caused the issue.

ADDITIONAL INFORMATION
Bug 424311 has been filed for a while now, but it seems similar.

I have KWIN_DRM_USE_EGL_STREAMS set, but this didn't change between "before" and "after" I saw the bug.
I don't think I have a kwin_env.sh in place, and I don't see __GL_NO_DSO_FINALIZER set (though I know I experimented with it in the past, in a failed attempt to fix the KWin shutdown hang where when you log out or shutdown, KWin burns a CPU core because it's stuck in Nvidia's /usr/lib/libGLX_nvidia.so.0 in a lock cmpxchg loop, I suspect waiting for a spinlock).

Should I report the KWin hang bug, here or to Nvidia? I'm not sure if it still occurs.
Comment 1 Vlad Zahorodnii 2021-10-05 08:42:44 UTC
> Around the time I upgraded to Plasma beta 5.22.90 (and nvidia 470.74)
Can you downgrade nvidia driver?
Comment 2 nyanpasu64 2021-10-05 11:43:00 UTC
The bug still occurs on 470.63, which I uninstalled a few days before installing Plasma 5.23 Beta. (For some reason Arch ships *many* different package releases of the same version, and I installed the last one I had installed.) I did not test downgrading kwin to 5.22.x.

Revised instructions:

1. Turn off compositing using alt+shift+f12 or by unchecking "Enable compositor on startup".
2. Sleep, wake, unlock.
3. Turn on compositing using alt+shift+f12, and the screen never redraws with compositing on (is hung at the last frame with compositing off, without shadows).

I've reproduced the bug multiple times in a row with these instructions, on both OpenGL 2.0 and 3.1.

The bug does not occur if I sleep/wake with compositing on, nor if I sleep/wake and then toggle compositing off/on afterwards, nor if I toggle compositing off/on and then sleep/wake.
Comment 3 Antonio Rojas 2021-10-05 11:54:36 UTC
(In reply to nyanpasu64 from comment #2)
> For some reason Arch ships *many* different
> package releases of the same version, and I installed the last one I had
> installed.

The reason being that the driver needs to be recompiled for every kernel update. You can't just downgrade nvidia as the kernel will just not load it, you need to also downgrade the kernel to the matching version (the one nvidia was compiled against).
Comment 4 nyanpasu64 2021-10-05 11:57:33 UTC
> The reason being that the driver needs to be recompiled for every kernel
> update. You can't just downgrade nvidia as the kernel will just not load it,
> you need to also downgrade the kernel to the matching version (the one
> nvidia was compiled against).

Welp, I'm using linux-zen rather than linux, and on every kernel/nvidia update, nvidia-dkms recompiles nvidia (for linux-zen I think?). So I suppose I dodged a bullet here.
Comment 5 Antonio Rojas 2021-10-05 11:59:12 UTC
Yep, if you're using dkms then you're fine
Comment 6 nyanpasu64 2021-10-23 23:26:55 UTC
I'm on Plasma 5.23.1 now, on nvidia-dkms 470.74-1. When I sleep and resume with compositing off (which corrupts VRAM because lol nvidia http://download.nvidia.com/XFree86/Linux-x86_64/470.74/README/powermanagement.html), I cannot enable compositing anymore. If I run kwin_x11 in a terminal, I get the following output (but no display hang) when attempting to enable compositing after sleep-wake:

OpenGL vendor string:                   NVIDIA Corporation
OpenGL renderer string:                 NVIDIA GeForce GT 730/PCIe/SSE2
OpenGL version string:                  3.1.0 NVIDIA 470.74
OpenGL shading language version string: 1.40 NVIDIA via Cg compiler
Driver:                                 NVIDIA
Driver version:                         470.74
GPU class:                              Unknown
OpenGL version:                         3.1
GLSL version:                           1.40
X server version:                       1.20.13
Linux kernel version:                   5.14.14
Requires strict binding:                no
GLSL shaders:                           yes
Texture NPOT support:                   yes
Virtual Machine:                        no
libkwinglutils: GL error: context lost
kwin_scene_opengl: OpenGL 2 compositing setup failed
kwin_scene_opengl: OpenGL driver recommends QPainter based compositing. Falling back to QPainter.
kwin_scene_opengl: To overwrite the detection use the environment variable KWIN_COMPOSE
kwin_scene_opengl: For more information see https://community.kde.org/KWin/Environment_Variables#KWIN_COMPOSE
kwin_core: Failed to initialize compositing, compositing disabled
qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 5156, resource id: 671, major code: 3 (GetWindowAttributes), minor code: 0
qt.qpa.xcb: QXcbConnection: XCB error: 9 (BadDrawable), sequence: 5157, resource id: 671, major code: 14 (GetGeometry), minor code: 0
qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 5163, resource id: 65011714, major code: 18 (ChangeProperty), minor code: 0

Oddly it mentions "OpenGL 2 compositing setup failed" regardless if I'm in OpenGL 2.0 or 3.1 mode.

If I kill and restart kwin_x11, I *can* enable compositing.

If I enable compositing then sleep/wake, compositing seems to continue working properly, and I can then disable and reenable compositing *after* sleep-wake. I haven't reproduced the flickering so far in my limited testing, so I'm not sure if it can still happen.
Comment 7 nyanpasu64 2021-10-24 00:00:20 UTC
I've also gotten corrupted VRAM when sleeping my system. I fixed it by creating `/etc/modprobe.d/nvidia.conf` with contents `NVreg_PreserveVideoMemoryAllocations=1`, then running `sudo systemctl enable nvidia-suspend` and rebooting. However I still can't enable compositing after sleep-wake with compositing off, with the same error:

libkwinglutils: GL error: context lost
kwin_scene_opengl: OpenGL 2 compositing setup failed
...

Additionally, if I sleep and wake with compositing on or off, Firefox and System Settings (the "Quick Settings" screen's bottom bar, or the entire window) sometimes still turn or flicker black.

I'm not sure if it's a KWin or Nvidia issue.
Comment 8 nyanpasu64 2021-10-24 00:14:49 UTC
Created attachment 142804 [details]
Screenshot of a partly black screen, taken while the screen was flickering.

The flickering black screen bug is still present.
Comment 9 Nate Graham 2021-10-25 23:25:51 UTC

*** This bug has been marked as a duplicate of bug 443951 ***