Bug 490452

Summary: Constant Crashes Since Plasma 6.0 update
Product: [Plasma] kwin Reporter: ogamal523
Component: wayland-genericAssignee: KWin default assignee <kwin-bugs-null>
Status: CLOSED UPSTREAM    
Severity: crash CC: xaver.hugl
Priority: NOR Flags: ogamal523: nouveau+
ogamal523: NVIDIA+
Version: 6.1.2   
Target Milestone: ---   
Platform: Arch Linux   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: Screen Corruption on Discrete GPU usage(Doesn't happen on x11)

Description ogamal523 2024-07-18 13:17:48 UTC
***
If you're not sure this is actually a bug, instead post about it at https://discuss.kde.org

If you're reporting a crash, attach a backtrace with debug symbols; see https://community.kde.org/Guidelines_and_HOWTOs/Debugging/How_to_create_useful_crash_reports
***

SUMMARY
Every time I use an application that uses the GPU my system hangs and I need to reboot and sometimes screen corruption happens, but that has been reduced significantly in recent updates, but the system hangs still happen alot the thing is it doesn't happen on Gnome or Hyprland, I tried enabling drm modesetting and fbdev, but it made no difference, tried reinstalling Arch multiple times and switched to Fedora and still no difference. 
Clarification: This has been happening for a couple of months since kde 6 was still in beta and happens only in the wayland session, x11 is fine

STEPS TO REPRODUCE
1. Launching a graphically demanding application eg: (Games, Hardware Accelerated Browser, etc..) and using it for a while

OBSERVED RESULT
System hangs unable to do anything unless I force reset it.


SOFTWARE/OS VERSIONS/HARDWARE
Linux/KDE Plasma: Arch Linux (Kernel: 6.9.9)
KDE Plasma Version: 6.1.2 (Wayland)
KDE Frameworks Version: 6.4.0
Qt Version: 6.7.2
GPU: RTX 3070 (Driver: 555.58.02)
CPU: i7 12700H 

ADDITIONAL INFORMATION
This the systemd log with the message that always shows up when the system crashes
-- Boot 22de9ce13c6f4b2a9a580b0a2e4976cc --
Jul 18 15:44:31 TheDevil kernel:
Jul 18 15:44:32 TheDevil kernel: iwlwifi 0000:00:14.3: WRT: Invalid buffer destination
Jul 18 15:44:32 TheDevil kernel: iwlwifi 0000:00:14.3: WRT: Invalid buffer destination
Jul 18 15:44:33 TheDevil kernel: iwlwifi 0000:00:14.3: WRT: Invalid buffer destination
Jul 18 15:44:33 TheDevil kernel: Bluetooth: hci0: Malformed MSFT vendor event: 0x02
Jul 18 15:44:36 TheDevil systemd-coredump[1685]: [🡕] Process 1565 (cups-proxyd) of u
ser 0 dumped core.

                                                 Stack trace of thread 1565:
                                                 #0  0x000061d770f67d75 update_next_proxy_printer (cups-proxyd + 0x9d75
)
                                                 #1  0x0000772bd16212c8 n/a (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
.7200.4 + 0x562c8)
                                                 ELF object binary architecture: AMD x86-64
Jul 18 15:44:42 TheDevil org_kde_powerdevil[2102]: busno=16, sleep-multiplier =  1.30. Testing for supp
orted feature 0x10 returned Error_Info[EIO in ddc_write_read_with_retry, causes: EIO]

And sometimes get those two messages as well:
Jul 18 14:15:05 TheDevil kwin_wayland[2103]: kwin_scene_opengl: Invalid framebuffer status:  "GL_FRAMEB
UFFER_INCOMPLETE_ATTACHMENT"
Jul 18 15:30:12 TheDevil kwin_wayland[1861]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug

It seems like the crashes don't show up in the logs because these logs don't mention the GPU at all
Comment 1 Zamundaaa 2024-07-18 14:02:37 UTC
Please attach the output of
> sudo dmesg
and
> journalctl --user-unit plasma-kwin_wayland --boot 0
after triggering one of those hangs (and before rebooting).
Comment 2 ogamal523 2024-07-18 19:03:53 UTC
(In reply to Zamundaaa from comment #1)
> after triggering one of those hangs (and before rebooting).

I will attach the outputs of the commands you wrote asap, but there something I didn't manage to get across once these hangs happen I can't do anything on my system unless I reboot, I can't even switch to the tty, so I can't provide those logs before reboot
Comment 3 ogamal523 2024-07-18 19:53:51 UTC
This is the plasma-kwin_wayland log I used the --boot -1 parameter:
Jul 18 22:30:01 TheDevil kwin_wayland[1894]: kf.windowsystem: static bool KX11Extras::mapVi
ewport() may only be used on X11
Jul 18 22:30:02 TheDevil kwin_wayland[1894]: kwin_screencast: PipeWire remote error:  conne
ction error
Jul 18 22:30:10 TheDevil kwin_wayland[1894]: QUnifiedTimer::stopAnimationDriver: driver is
not running
Jul 18 22:32:45 TheDevil kwin_wayland[1894]: trying to show an empty dialog
Jul 18 22:32:45 TheDevil kwin_wayland[1894]: This plugin does not support setting window ma
sks
Jul 18 22:32:45 TheDevil kwin_wayland[1894]: This plugin does not support setting window ma
sks
Jul 18 22:32:45 TheDevil kwin_wayland[1894]: This plugin does not support raise()
Jul 18 22:32:45 TheDevil kwin_wayland[1894]: kwin_scene_opengl: 0x1: GL_INVALID_OPERATION i
n glDrawBuffers(unsupported buffer GL_BACK_LEFT)
Jul 18 22:32:45 TheDevil kwin_wayland[1894]: kwin_scene_opengl: 0x1: GL_INVALID_OPERATION i
n glDrawBuffers(unsupported buffer GL_BACK_LEFT)
Jul 18 22:32:45 TheDevil kwin_wayland[1894]: kwin_scene_opengl: 0x1: GL_INVALID_OPERATION i
n glDrawBuffers(unsupported buffer GL_BACK_LEFT)
Jul 18 22:32:45 TheDevil kwin_wayland[1894]: kwin_scene_opengl: 0x1: GL_INVALID_OPERATION i
n glDrawBuffers(unsupported buffer GL_BACK_LEFT)
Jul 18 22:32:45 TheDevil kwin_wayland[1894]: kwin_scene_opengl: 0x1: GL_INVALID_OPERATION i
n glDrawBuffers(unsupported buffer GL_BACK_LEFT)
Jul 18 22:32:45 TheDevil kwin_wayland[1894]: kwin_scene_opengl: 0x1: GL_INVALID_OPERATION i
n glDrawBuffers(unsupported buffer GL_BACK_LEFT)
Jul 18 22:32:45 TheDevil kwin_wayland[1894]: kwin_scene_opengl: 0x1: GL_INVALID_OPERATION i
n glDrawBuffers(unsupported buffer GL_BACK_LEFT)
Jul 18 22:32:45 TheDevil kwin_wayland[1894]: kwin_scene_opengl: 0x1: GL_INVALID_OPERATION i
n glDrawBuffers(unsupported buffer GL_BACK_LEFT)
Jul 18 22:32:45 TheDevil kwin_wayland[1894]: kwin_scene_opengl: 0x1: GL_INVALID_OPERATION i
n glDrawBuffers(unsupported buffer GL_BACK_LEFT)
Jul 18 22:32:45 TheDevil kwin_wayland[1894]: kwin_scene_opengl: 0x1: GL_INVALID_OPERATION i
n glDrawBuffers(unsupported buffer GL_BACK_LEFT)
Jul 18 22:32:45 TheDevil kwin_wayland[1894]: kwin_scene_opengl: 0x1: GL_INVALID_OPERATION i
n glDrawBuffers(unsupported buffer GL_BACK_LEFT)
Jul 18 22:32:45 TheDevil kwin_wayland[1894]: kwin_scene_opengl: 0x1: GL_INVALID_OPERATION i
n glDrawBuffers(unsupported buffer GL_BACK_LEFT)
Jul 18 22:32:46 TheDevil kwin_wayland[1894]: kwin_scene_opengl: 0x1: GL_INVALID_OPERATION i
n glDrawBuffers(unsupported buffer GL_BACK_LEFT)
Jul 18 22:32:46 TheDevil kwin_wayland[1894]: kwin_scene_opengl: 0x1: GL_INVALID_OPERATION i
n glDrawBuffers(unsupported buffer GL_BACK_LEFT)
Jul 18 22:34:52 TheDevil kwin_wayland[1894]: qml: Krohnkite: Screen(output):eDP-1, layouts: 0.T
ileLayout, ,1.ThreeColumnLayout, ,2.SpiralLayout, ,3.FloatingLayout ,
Jul 18 22:34:52 TheDevil kwin_wayland[1894]: qml: Krohnkite: Screen(output):HDMI-A-1, layouts:
0.TileLayout, ,1.ThreeColumnLayout, ,2.SpiralLayout, ,3.FloatingLayout ,
Jul 18 22:35:08 TheDevil kwin_wayland[1894]: QUnifiedTimer::stopAnimationDriver: driver is
not running
Jul 18 22:41:55 TheDevil kwin_wayland[1894]: QUnifiedTimer::stopAnimationDriver: driver is
not running
Jul 18 22:42:21 TheDevil kwin_wayland[1894]: kf.windowsystem: static bool KX11Extras::mapVi
ewport() may only be used on X11
Jul 18 22:42:22 TheDevil kwin_wayland[1894]: kf.windowsystem: static bool KX11Extras::mapVi
ewport() may only be used on X11
Comment 4 ogamal523 2024-07-18 19:56:49 UTC
This is the dmesg log I used journalctl -b -1 -k to get it from the previus session:

Jul 18 22:29:57 TheDevil systemd-journald[456]: File /var/log/journal/3ee6304472224eee918dc
9d133bce8e3/user-1000.journal corrupted or uncleanly shut down, renaming and replacing.
Jul 18 22:29:57 TheDevil kernel: block nvme0n1: No UUID available providing old NGUID
Jul 18 22:29:59 TheDevil kernel: input: soundcore P25i (AVRCP) as /devices/virtual/input/input34
Jul 18 22:29:59 TheDevil kernel: wlan0: authenticate with b0:ac:d2:34:3c:4b (local address=84:7b:57:be:8e:d0)
Jul 18 22:29:59 TheDevil kernel: wlan0: send auth to b0:ac:d2:34:3c:4b (try 1/3)
Jul 18 22:29:59 TheDevil kernel: wlan0: authenticated
Jul 18 22:29:59 TheDevil kernel: wlan0: associate with b0:ac:d2:34:3c:4b (try 1/3)
Jul 18 22:29:59 TheDevil kernel: wlan0: RX AssocResp from b0:ac:d2:34:3c:4b (capab=0x411 status=0 aid=1)
Jul 18 22:29:59 TheDevil kernel: wlan0: associated
Jul 18 22:29:59 TheDevil kernel: warning: `kdeconnectd' uses wireless extensions which will
 stop working for Wi-Fi 7 hardware; use nl80211
Jul 18 22:29:59 TheDevil kernel: userif-4: sent link up event.
Jul 18 22:30:00 TheDevil kernel: userif-4: sent link up event.
Jul 18 22:30:00 TheDevil kernel: kauditd_printk_skb: 35 callbacks suppressed
Jul 18 22:30:00 TheDevil kernel: audit: type=1400 audit(1721331000.476:118): apparmor="DENIED" operatio
n="open" class="file" profile="mariadbd_akonadi" name="/sys/block/" pid=2420 comm="mysqld" requested_mask="r" denied_mask="r" f
suid=1000 ouid=0
Jul 18 22:30:00 TheDevil kernel: audit: type=1400 audit(1721331000.493:119): apparmor="DENIED" operatio
n="open" class="file" profile="mariadbd_akonadi" name="/proc/2420/cgroup" pid=2420 comm="mysqld" requested_mask="r" denied_mask
="r" fsuid=1000 ouid=1000
Jul 18 22:30:00 TheDevil kernel: audit: type=1400 audit(1721331000.784:120): apparmor="DENIED" operatio
n="exec" class="file" profile="mariadbd_akonadi" name="/usr/bin/mariadb" pid=2449 comm="sh" requested_mask="x" denied_mask="x" 
fsuid=1000 ouid=0
Jul 18 22:30:00 TheDevil kernel: audit: type=1400 audit(1721331000.784:121): apparmor="DENIED" operatio
n="open" class="file" profile="mariadbd_akonadi" name="/usr/bin/mariadb" pid=2449 comm="sh" requested_mask="r" denied_mask="r" 
fsuid=1000 ouid=0
Jul 18 22:30:01 TheDevil kernel: audit: type=1400 audit(1721331001.156:122): apparmor="DENIED" operatio
n="open" class="file" profile="/usr/bin/akonadiserver" name="/var/lib/flatpak/exports/share/mime/mime.cache" pid=2402 comm="Not
ificationMan" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
Jul 18 22:30:01 TheDevil kernel: audit: type=1400 audit(1721331001.156:123): apparmor="DENIED" operatio
n="open" class="file" profile="/usr/bin/akonadiserver" name="/var/lib/flatpak/exports/share/mime/packages/" pid=2402 comm="Noti
ficationMan" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
Jul 18 22:30:01 TheDevil kernel: audit: type=1400 audit(1721331001.156:124): apparmor="DENIED" operatio
n="open" class="file" profile="/usr/bin/akonadiserver" name="/nix/store/s5dmrppv5byr7r1lndh98w14c3i38j8s-home-manager-path/shar
e/mime/mime.cache" pid=2402 comm="NotificationMan" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
Jul 18 22:30:01 TheDevil kernel: audit: type=1400 audit(1721331001.156:125): apparmor="DENIED" operatio
n="open" class="file" profile="/usr/bin/akonadiserver" name="/nix/store/s5dmrppv5byr7r1lndh98w14c3i38j8s-home-manager-path/shar
e/mime/packages/" pid=2402 comm="NotificationMan" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
Jul 18 22:30:01 TheDevil kernel: userif-4: sent link up event.
Jul 18 22:30:05 TheDevil kernel: input: soundcore P25i (AVRCP) as /devices/virtual/input/input35
Jul 18 22:42:21 TheDevil kernel: usb 3-1: USB disconnect, device number 2
Jul 18 22:42:22 TheDevil kernel: usb 3-1: new high-speed USB device number 7 using xhci_hcd
Jul 18 22:42:23 TheDevil kernel: usb 3-1: New USB device found, idVendor=18d1, idProduct=4ee7, bcdDevice= 4.40
Jul 18 22:42:23 TheDevil kernel: usb 3-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Jul 18 22:42:23 TheDevil kernel: usb 3-1: Product: Pixel 5a
Jul 18 22:42:23 TheDevil kernel: usb 3-1: Manufacturer: Google
Comment 5 Zamundaaa 2024-07-19 12:35:11 UTC
> kwin_screencast: PipeWire remote error:  connection error
That's pretty odd. Other than that, there's nothing out of place in those logs.

(In reply to ogamal523 from comment #2)
> I will attach the outputs of the commands you wrote asap, but there
> something I didn't manage to get across once these hangs happen I can't do
> anything on my system unless I reboot, I can't even switch to the tty, so I
> can't provide those logs before reboot
Can you ssh in from a different device when that's the case?
Comment 6 ogamal523 2024-07-19 22:02:13 UTC
Created attachment 171805 [details]
Screen Corruption on Discrete GPU usage(Doesn't happen on x11)

> Can you ssh in from a different device when that's the case?
Sadly no, it seems that everything dies once one of these hangs happen, but I managed to get a picture of the screen corruption that I mentioned had been happening frequently until plasma 6.1 update after it happened alot less
Comment 7 Zamundaaa 2024-07-20 00:15:18 UTC
Okay, then the issue is in the kernel. It happening in a way that there's nothing in the system log about it probably makes it very hard to track down, but given you see that corruption it's most likely somewhere in the GPU driver. As it happens on the internal display, probably in i915, so you might be able to get help for it at https://gitlab.freedesktop.org/drm/i915/kernel/-/issues
Comment 8 ogamal523 2024-07-20 17:59:23 UTC
> Okay, then the issue is in the kernel. It happening in a way that there's
I will try the LTS kernel and see if it happens again

> down, but given you see that corruption it's most likely somewhere in the
> GPU driver. As it happens on the internal display, probably in i915, so you
> might be able to get help for it at
> https://gitlab.freedesktop.org/drm/i915/kernel/-/issues
Thank you for your time, I will see if this gitlab helps
Comment 9 ogamal523 2024-07-20 21:50:41 UTC
> Okay, then the issue is in the kernel. It happening in a way that there's
You were right the issue was in the kernel I tried the LTS kernel and it worked fine and I was filing an issue in i915 gitlab turns out someone reported the same issue and it has been fixed in linux 6.10, I downloaded it from the testing repo and it has been working fine.
Thank so you much you don't know how much you have helped me.