Bug 472676

Summary: Plasmashell gets corrupted by the NVidia driver losing graphics memory on sleep, and KWin doesn't handle that well and crashes
Product: [Plasma] kwin Reporter: istasi
Component: generic-crashAssignee: KWin default assignee <kwin-bugs-null>
Status: CONFIRMED ---    
Severity: crash CC: edmund.laugasson, kndevl, nate, traceydick, xaver.hugl
Priority: NOR    
Version First Reported In: 5.27.6   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: Journalctl --user-unit=plasma-kwin_wayland --boot 0
drm_info
image of how the error looks on wake
neofetch, not sure if it matters? but more information is better i guess
dmesg-drm-debug.log
kwin-drm-debug.tmp
drm-debug.20230807.log
drm-debug.20230809.log
screenshot of display after boot, with window moved around abit

Description istasi 2023-07-26 17:13:32 UTC
Created attachment 160545 [details]
Journalctl --user-unit=plasma-kwin_wayland --boot 0

SUMMARY
Whenever i put my pc to sleep, and then when it wakes up, after a short bit, i get a "locked/crashed" desktop, this only happens when im using kde wayland, and not when im using kde x11

***
NOTE: If you are reporting a crash, please try to attach a backtrace with debug symbols.
See https://community.kde.org/Guidelines_and_HOWTOs/Debugging/How_to_create_useful_crash_reports
***


STEPS TO REPRODUCE
1.  Put PC to sleep
2.  Wake PC
3.  See error

OBSERVED RESULT
Unusable state of desktop ui

EXPECTED RESULT
Usable state of desktop ui

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: NixOS 23.11pre506474.12303c652b8 (Tapir) x86_64
KDE Plasma Version: 5.27.6
KDE Frameworks Version: Not sure
Qt Version: Not sure

ADDITIONAL INFORMATION
Im new to linux, wanting to throw myself out in the deep end, from having used windows for ages.
If i've filled this out wrong, or there's a lack of information, please inform what and how, and i'll do my best to get the information.
Comment 1 istasi 2023-07-26 17:13:57 UTC
Created attachment 160546 [details]
drm_info
Comment 2 istasi 2023-07-26 17:14:24 UTC
Created attachment 160547 [details]
image of how the error looks on wake
Comment 3 istasi 2023-07-26 17:14:57 UTC
Created attachment 160548 [details]
neofetch, not sure if it matters? but more information is better i guess
Comment 4 Zamundaaa 2023-08-04 12:51:41 UTC
This is the problem:

Jul 26 18:54:40 malene kwin_wayland_wrapper[1424]: kwin_scene_opengl: A graphics reset attributable to the current GL context occurred.
Jul 26 18:54:40 malene kwin_wayland_wrapper[1424]: OpenGL vendor string:                   NVIDIA Corporation
Jul 26 18:54:40 malene kwin_wayland_wrapper[1424]: OpenGL renderer string:                 NVIDIA GeForce RTX 4080/PCIe/SSE2
Jul 26 18:54:40 malene kwin_wayland_wrapper[1424]: OpenGL version string:                  3.1.0 NVIDIA 535.86.05
Jul 26 18:54:40 malene kwin_wayland_wrapper[1424]: OpenGL shading language version string: 1.40 NVIDIA via Cg compiler
Jul 26 18:54:40 malene kwin_wayland_wrapper[1424]: Driver:                                 NVIDIA
Jul 26 18:54:40 malene kwin_wayland_wrapper[1424]: Driver version:                         535.86.5
Jul 26 18:54:40 malene kwin_wayland_wrapper[1424]: GPU class:                              Unknown
Jul 26 18:54:40 malene kwin_wayland_wrapper[1424]: OpenGL version:                         3.1
Jul 26 18:54:40 malene kwin_wayland_wrapper[1424]: GLSL version:                           1.40
Jul 26 18:54:40 malene kwin_wayland_wrapper[1424]: X server version:                       1.23.1
Jul 26 18:54:40 malene kwin_wayland_wrapper[1424]: Linux kernel version:                   6.1.39
Jul 26 18:54:40 malene kwin_wayland_wrapper[1424]: Requires strict binding:                no
Jul 26 18:54:40 malene kwin_wayland_wrapper[1424]: GLSL shaders:                           yes
Jul 26 18:54:40 malene kwin_wayland_wrapper[1424]: Texture NPOT support:                   yes
Jul 26 18:54:40 malene kwin_wayland_wrapper[1424]: Virtual Machine:                        no
Jul 26 18:54:44 malene kwin_wayland_wrapper[1424]: kwin_wayland_drm: Atomic commit failed! Invalid argument
Jul 26 18:54:44 malene kwin_wayland_wrapper[1424]: kwin_wayland_drm: Presentation failed! Invalid argument

The first part about the graphics reset is afaik sort of expected with NVidia after wakeup, but the second part about the commit failing as a result should never happen. Can you try recording more verbose debug logging from the kernel when this happens? How to do that is described at https://invent.kde.org/plasma/kwin/-/wikis/Debugging-DRM-issues
Comment 5 istasi 2023-08-04 19:42:43 UTC
Created attachment 160745 [details]
dmesg-drm-debug.log

Okay, im not sure if i did it right, since the script described in the link seems hard to execute with sleep, i think, so what i did was 

login as normal, ctrl-alt-1 to get to tty1

run these
export QT_LOGGING_RULES="kwin_*.debug=true"
export LC_ALL=C

echo 0x1FF | sudo tee /sys/module/drm/parameters/debug

back into kde, put pc in sleep, wake pc again, and then run this one
sudo dmesg -w | grep drm > dmesg-drm-debug.log

im not sure if i can do this part? as kde is just started when i boot?
kwin_wayland > kwin-drm-debug.tmp 2>&1 &
Comment 6 istasi 2023-08-04 19:46:02 UTC
oh, after running the dmesg -w command i also tabbed back into kde moved the mouse around abit, in that buggy state thingy, shown in the screenshot
Comment 7 istasi 2023-08-04 20:00:48 UTC
Created attachment 160746 [details]
kwin-drm-debug.tmp

i tried to do 
systemctl stop display-manager.service

and then the 
kwin_wayland > kwin-drm-debug.tmp 2>&1 &

but all i get is black screen on both monitors, monitor reports 165fps though, vs the 60 i normally get in tty, so it did something to do the display, not sure it matters but heres the .tmp file for that, even though i didn't get to put it in sleep

not sure if it helps
Comment 8 istasi 2023-08-05 16:22:22 UTC
I've tested gnome out, and it seems to have similarly problem, not exactly the same, with the trailing mouse, but certainly not a functioning desktop either
Comment 9 Zamundaaa 2023-08-07 12:30:35 UTC
That is already very useful, and the problem is visible in the log - the driver claims the output needs a modeset. It seems to be missing the cause though; could you record the log again, but this time start recording before suspending the system? For that you can execute
echo 0x1FF | sudo tee /sys/module/drm/parameters/debug
and
sudo dmesg -w | grep drm > dmesg-drm-debug.log
(leave it running)
then suspend, wake up again and cancel the command
Comment 10 istasi 2023-08-07 13:05:10 UTC
Created attachment 160797 [details]
drm-debug.20230807.log

What i did
logged out out kde to login screen.

ctrl alt to tty 1, run the 
echo 0x1FF | sudo tee /sys/module/drm/parameters/debug
and
sudo dmesg -w | grep drm > dmesg-drm-debug.log

ctrl alt back to login screen, select wayland as the thingy, logged in, stopped steam from opening.
put pc in sleep, wait a few seconds, start pc again

wait for display to come alive, takes a few, ctrl alt to tty 1, ctrl c the dmesg thing
Comment 11 istasi 2023-08-07 13:11:09 UTC
probably not relevant, but i dont know
part of my configuration.nix file, i have cut it down to what seems to be relevant to me atleast to either nvidia or kde/plasma, if it turns out to be a nixos problem, i'll try and go there with it instead

  hardware.opengl = {
    enable = true;
    driSupport = true;
    driSupport32Bit = true;
  };

  nixpkgs.config = {
    allowUnfree = true;
  };

  nixpkgs.config.allowUnfreePredicate = pkg:
    builtins.elem (lib.getName pkg) [
      "steam"
      "steam-original"
      "steam-run"
      "nvidia-x11"
      "nvidia-settings"
    ];


  services.xserver.videoDrivers = ["nvidia"];

  hardware.nvidia = {
    modesetting.enable = true;

    open = true;

    nvidiaSettings = true;
    forceFullCompositionPipeline = true;

    package = config.boot.kernelPackages.nvidiaPackages.stable;
  };

  # Enable the Plasma 5 Desktop Environment.
  services.xserver.displayManager.sddm.enable = true;
  services.xserver.desktopManager.plasma5.enable = true;
  #services.xserver.displayManager.defaultSession = "plasmawayland";

  services.xrdp.enable = true;
  services.xrdp.defaultWindowManager = "startplasma-x11";
  services.xrdp.openFirewall = true;
Comment 12 istasi 2023-08-09 10:08:33 UTC
Created attachment 160856 [details]
drm-debug.20230809.log

Was a big update thingy, running plasma 5.27.7 now, not sure if kwin also got updated, but new drm-debug.log thingy now
it behaves differently when it comes up from sleep now, its interactive now vs before, i can move windows around and such, not exactly usable though

what i did was login, check the command to run, do the 
ctrl alt to tty1

echo 0x1FF | sudo tee /sys/module/drm/parameters/debug

sudo dmesg -w | grep drm > dmesg-drm-debug.log

ctrl alt to kde/plasma

put system to sleep, wait abit, then wake again, wait abit to see the screen come alive, move window around abit

ctrl alt to tty1, and stop the dmesg

Gzip: file was too big to be uploaded raw, so had to compress it, there's probably parts i could cut out of it to make the size fit, but im not sure whats important in the file
Comment 13 istasi 2023-08-09 10:10:07 UTC
Created attachment 160857 [details]
screenshot of display after boot, with window moved around abit

if you ignore all the flickering stuff that's everything outside of the focused window, its nearly usable.
Comment 14 Zamundaaa 2023-08-09 12:27:32 UTC
Okay, the original issue is gone.
What happens if you kill plasmashell when you see these glitches?
Comment 15 istasi 2023-08-09 12:58:00 UTC
What i did, reboot so i had a clean start, at login screen ctrl alt to tty 1
run
echo 0x1FF | sudo tee /sys/module/drm/parameters/debug

sudo dmesg -w | grep drm > dmesg-drm-debug.log

then back into kde/plasma put it in sleep, wait abit, wake it, confirm it was still "usable", it was, roughly looking as seen in the last screenshot attached.

ctrl alt to tty 4
ps ax | grep plasma

didn't save the output of this, but plasmashell had --no-respawn on it
killed it just using kill pid, died, didn't have to do kill -9, 

ctrl alt to kde/plasma again

and the screen was looked, like there's no updates on the screen, the thing seems to be running still, because it was loading a webpage, and if i ctrl alt to tty4 and back in again, the webpage had updated, but never when in the tty with the display, also 2nd monitor was just black, not even the mouse trail from moving around from between wake and kill

i would attach the dmesg log, but even with gzip its 4.4mb which is above what i can attach here, are there parts i can cut out if needed?
Comment 16 Zamundaaa 2023-08-09 13:10:29 UTC
The dmesg log is not needed for this.
To kill plasmashell in the running session (so that we can exclude potential issues being triggered by tty switching), you can just use ctrl+meta+esc and click on the panel.
Comment 17 istasi 2023-08-09 13:26:48 UTC
Okay, so what i did reboot to have clean slate kinda thing, login using wayland, opened firefox i have something open after the sleep... slept pc, waited abit, woke again.

first i tried the ctrl meta escape because firefox was fullscreen, i just clicked on desktop on secondary monitor, primary monitor seemed to work fine with firefox, no glitches or anything, just black background though, just with firefox opened and nothing else, firefox seemed to work normally

second i repeated the above, but instead of doing the ctrl meta escape on secondary monitor, i did it on primary, just in case glitching would happen on the monitor i used the kill thingy on?, but it turned out exactly the same as the first time, secondary monitor in full glitch mode thingy, and primary working fine, except for all ui things being gone, and only firefox was open
Comment 18 istasi 2023-08-09 13:29:12 UTC
i think i wrote that badly, first time after ctrl meta escape on secondary monitor, primary screen worked as i would expect, firefox was responsive, no glitching when moving it around, but that was all. Secondary monitor was full glitch mode though, hence why i wanted to test if the glitch mode was on the monitor which i used the ctrl meta escape thing on
Comment 19 Zamundaaa 2023-08-10 13:19:40 UTC
hmm, that is unexpected, it suggests that not all Plasma desktop windows get closed... It does confirm what's happening though - plasmashell gets corrupted by the NVidia driver losing graphics memory on standby, and KWin unfortunately doesn't handle drawing on top of that too well.

As a workaround for NVidia losing graphics memory, you could install nvidia-persistenced. It should restore things after suspend
Comment 20 istasi 2023-08-10 15:17:08 UTC
https://www.youtube.com/watch?v=W_Dwz3nMyyQ

took video of it coming up from boot, sleep, then wake, and then finally the ctrl meta escape kill plasma thingy

this is with nvidia-persistenced installed, i think correctly, the demon is running atleast, and its on ps ax | grep nvidia
Comment 21 Zamundaaa 2023-08-16 17:33:53 UTC
*** Bug 473393 has been marked as a duplicate of this bug. ***
Comment 22 istasi 2023-08-20 16:30:59 UTC
I've tested without nvidia open (no idea what that is) now, and same issue, both with nvidia persistenced and without, although pc comes up from sleep a lot faster without nvidia open. No idea if this makes any sort of difference
Comment 23 istasi 2023-08-20 16:46:35 UTC
(In reply to Zamundaaa from comment #19)
> hmm, that is unexpected, it suggests that not all Plasma desktop windows get
> closed... It does confirm what's happening though - plasmashell gets
> corrupted by the NVidia driver losing graphics memory on standby, and KWin
> unfortunately doesn't handle drawing on top of that too well.
> 
> As a workaround for NVidia losing graphics memory, you could install
> nvidia-persistenced. It should restore things after suspend

with this in etc/nixos/configuration, it does indeed restore things.
hardware.nvidia = {
    nvidiaPersistenced = true;
    powerManagement.enable = true;
}

it does not restore without powerManagement.enable = true; or with open = true;

Discord doesn't even crash every time system comes up from sleep now, wooh \o/
Comment 24 Edmund Laugasson 2024-03-11 01:55:46 UTC
Having basically same situation - after sleep plasmashell gets corrupted, system will be freezed for 5-6 seconds. Using a bit newer KDE 6.0.1 with Wayland - https://pastebin.com/wqshDftL - currently solved with using X11 instead of Wayland.