Bug 452435

Summary: [Wayland] SDL Applications crash with error "wl_registry@2: error 0: invalid global wp_drm_lease_device_v1 (50)" when external display is unplugged
Product: [Plasma] kwin Reporter: VesperLlama <kde>
Component: platform-drmAssignee: KWin default assignee <kwin-bugs-null>
Status: RESOLVED FIXED    
Severity: crash CC: kde, nate, thetabbycatty, wengxt
Priority: NOR    
Version: 5.24.4   
Target Milestone: ---   
Platform: Arch Linux   
OS: Linux   
Latest Commit: Version Fixed In: 5.24.5
Sentry Crash Report:
Attachments: Bactrace for XWayland
Bactrace for Steam
Log when opening Steam
wayland-session log
XWayland Backtrace

Description VesperLlama 2022-04-09 14:04:51 UTC
Created attachment 148068 [details]
Bactrace for XWayland

SUMMARY
When I open SDL Xwayland applications like Steam, they crash with the following error "Fatal IO error 22 (Invalid argument) on X server :1." and wayland-session.log show the following error too "wl_registry@2: error 0: invalid global wp_drm_lease_device_v1 (50)".

After this happens all other Xwayland apps also stop working but they work fine before this. This issue also happens randomly while using the PC even if I don't run an SDL app but it is always reproducible when opening Steam or other SDL app.

I am on a laptop and when I connect an external monitor then these apps work fine but they don't work when external monitor is unplugged.

STEPS TO REPRODUCE
1. Run an SDL application like Steam in Plasma Wayland session while external monitor is unplugged.

OBSERVED RESULT
It crashes and the log shows the above errors.

EXPECTED RESULT
The application should work properly.

SOFTWARE/OS VERSIONS
Linux: 5.17.1-zen
KDE Plasma Version: 5.24.4
KDE Frameworks Version: 5.92.0
Qt Version: 5.15.3

ADDITIONAL INFORMATION
iGPU: AMD Renoir Vega 8
dGPU: AMD RX6600M
Mesa: 22.0.1
Wayland: 1.20.0
XWayland: 22.1.1
Comment 1 VesperLlama 2022-04-09 14:05:30 UTC
Created attachment 148069 [details]
Bactrace for Steam
Comment 2 VesperLlama 2022-04-09 14:06:10 UTC
Created attachment 148070 [details]
Log when opening Steam
Comment 3 David Edmundson 2022-04-11 10:05:04 UTC
Can you include output of 

WAYLAND_DEBUG=1  someAppThatCrashes

kwin is killing the app, but it all implies that the app is doing something wrong.
Comment 4 VesperLlama 2022-04-11 11:30:25 UTC
(In reply to David Edmundson from comment #3)
> Can you include output of 
> 
> WAYLAND_DEBUG=1  someAppThatCrashes

Doing this shows the same output as before. Is this because it's an XWayland app?

Output - 

WAYLAND_DEBUG=1 steam-runtime
steam.sh[1796]: Running Steam on arch rolling 64-bit
steam.sh[1796]: STEAM_RUNTIME is enabled automatically
setup.sh[1872]: Steam runtime environment up-to-date!
steam.sh[1796]: Steam client's requirements are satisfied
WARNING: setlocale('en_US.UTF-8') failed, using locale: 'C'. International characters may not work.
[2022-04-11 16:24:46] Startup - updater built Mar 14 2022 19:48:46
Installing breakpad exception handler for appid(steam)/version(1647446817)
[2022-04-11 16:24:46] Loading cached metrics from disk (/home/shreyansh/.local/share/Steam/package/steam_client_metrics.bin)
[2022-04-11 16:24:46] Using the following download hosts for Public, Realm steamglobal
[2022-04-11 16:24:46] 1. https://cdn.cloudflare.steamstatic.com, /client/, Realm 'steamglobal', weight was 100, source = 'update_hosts_cached.vdf'
[2022-04-11 16:24:46] 2. https://cdn.akamai.steamstatic.com, /client/, Realm 'steamglobal', weight was 100, source = 'update_hosts_cached.vdf'
[2022-04-11 16:24:46] 3. http://media.steampowered.com, /client/, Realm 'steamglobal', weight was 1, source = 'baked in'
Installing breakpad exception handler for appid(steam)/version(1647446817)
[2022-04-11 16:24:46] Verifying installation...
[2022-04-11 16:24:46] Verification complete
Loaded SDL version 2.0.21-7140709
Fatal IO error 22 (Invalid argument) on X server :1.

wayland-session.log -
Got an error
Got an error
error in client communication (pid 824)
(EE)
Fatal server error:
(EE) wl_registry@2: error 0: invalid global wp_drm_lease_device_v1 (50)
(EE)
The X11 connection broke (error 1). Did the X11 server die?
The X11 connection broke (error 1). Did the X11 server die?
The X11 connection broke (error 1). Did the X11 server die?
The X11 connection broke (error 1). Did the X11 server die?
Comment 5 VesperLlama 2022-04-13 18:26:59 UTC
XWayland also crashes when I unplug the external monitor while an SDL application is running. This crash dump is giving more information than before so maybe it will help.
Comment 6 VesperLlama 2022-04-13 18:27:46 UTC
Created attachment 148145 [details]
wayland-session log
Comment 7 VesperLlama 2022-04-13 18:28:39 UTC
Created attachment 148146 [details]
XWayland Backtrace
Comment 8 VesperLlama 2022-04-13 19:19:43 UTC
I found the exact cause of the crash. My laptop has a dedicated GPU which is the AMD RX 6600M. It's mostly in sleep mode and it has the integrated GPU which is the AMD Vega 8 from Ryzen 5800H. 

Whenever the dGPU wakes up from sleep, XWayland crashes. This is always reproducible. Whenever I use DRI_PRIME=1 'anyapp', the dGPU wakes up and then XWayland crashes. SDL applications automatically wake up the dGPU that's why they crash. 

When I plug in an external monitor, it directly connects to the dGPU. I am not able to change the display configuration when I am logged in (I think because the dGPU wakes up when I plug in the HDMI cable and XWayland crashes). I have to logout and then the display switches to the external monitor and then all apps work fine.

I haven't been able to get any useful logs regarding this. Journalctl doesn't show anything related (atleast I think it's not related). If there is any way to get logs for this, then I would be glad to get them.

In short, dGPU waking up in between crashes XWayland but if it is not sleeping from the start before logging in then XWayland works fine.

System information -
CPU: AMD Ryzen 5800H
iGPU: AMD Renoir Vega 8
dGPU: AMD RX6600M
Mesa: 22.0.1
Comment 9 Luiz Gustavo 2022-04-13 23:07:40 UTC
This issue happens when binding a GPU to the system.
I have a Ryzen 7 5700G with built-in graphics, that my monitor is attached to, and a RX 580 with no display attached, not bound to any kernel module on boot so it's not recognized by KDE.
When I manually bind the RX 580 to amdgpu, Xwayland crashes with the same error message.

I'd assume this also affects external GPU setups hotplugged over Thunderbolt, when the GPU is initialized and the amdgpu driver is bound to the new GPU, Xwayland would crash in a similar manner, however I cannot confirm this since I don't have that hardware.

This issue seems to be KDE specific, as GNOME is not affected by this.

System Info
Fedora 36
Plasma Version 5.24.4
Frameworks Version 5.91.0
Qt Version 5.15.3
Kernel 5.17.2
Mesa 22.0.1
Wayland 1.20.0
Xwayland 22.1.1
Comment 10 Weng Xuetian 2022-04-16 03:56:27 UTC
KWIN_XWAYLAND_DEBUG=1

Some relevant log from Xwayland, it seems there's some race. 

[1937782.511] wl_registry@15.global_remove(52)
[1937782.581] wl_registry@2.global_remove(52)
[1937782.734]  -> zxdg_output_v1@31.destroy()
[1937782.754] wl_registry@15.global_remove(51)
[1937782.761] wl_registry@2.global_remove(51)
[1937782.768] wl_registry@15.global_remove(50)
[1937782.776] wl_registry@2.global_remove(50)
[1937782.785]  -> wp_drm_lease_device_v1@29.release()
[1937782.792] wl_registry@15.global(53, "wp_drm_lease_device_v1", 1)
[1937782.810] wl_registry@2.global(53, "wp_drm_lease_device_v1", 1)
[1937782.834]  -> wl_registry@2.bind(53, "wp_drm_lease_device_v1", 1, new id [unknown]@32)
[1937782.861] wl_registry@15.global_remove(53)
[1937782.868] wl_registry@2.global_remove(53)
[1937782.876]  -> wp_drm_lease_device_v1@32.release()
error in client communication (pid 43035)
[1937783.583] wl_display@1.delete_id(31)
[1937783.603] wl_display@1.delete_id(29)
[1937783.611] wl_display@1.error(wl_registry@2, 0, "invalid global wp_drm_lease_device_v1 (53)")
Comment 11 Weng Xuetian 2022-04-16 08:43:48 UTC
There are two bugs, in xwayland and kwayland-server.

This https://invent.kde.org/plasma/kwayland-server/-/merge_requests/370 tries to fix the race condition triggered by kwayland-server.

And xwayland has a bug of invalid pointer access that may crash xwayland. https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/894
Comment 12 Luiz Gustavo 2022-04-16 16:24:18 UTC
(In reply to Weng Xuetian from comment #11)
> There are two bugs, in xwayland and kwayland-server.
> 
> This https://invent.kde.org/plasma/kwayland-server/-/merge_requests/370
> tries to fix the race condition triggered by kwayland-server.
> 
> And xwayland has a bug of invalid pointer access that may crash xwayland.
> https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/894

Tested these two patches and they appear to have fixed the problem, thank you very much.
Comment 13 VesperLlama 2022-04-16 18:02:57 UTC
(In reply to Weng Xuetian from comment #11)
> There are two bugs, in xwayland and kwayland-server.
> 
> This https://invent.kde.org/plasma/kwayland-server/-/merge_requests/370
> tries to fix the race condition triggered by kwayland-server.
> 
> And xwayland has a bug of invalid pointer access that may crash xwayland.
> https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/894

Thank you for the quick fix.
Comment 14 Weng Xuetian 2022-04-16 20:14:46 UTC
Git commit cdc9dcfb0a1f0445c46ee2cefa114329a2408555 by Weng Xuetian.
Committed on 16/04/2022 at 07:01.
Pushed by xuetianweng into branch 'master'.

Fix race in wp_drm_lease_v1.

Basically this is a well known issue in wayland for globals. If bind
comes after destroyed, it will raise a invalid global error. The common
practice is to delay the destroy of global. Similar technique is also
applied to wl_output.

M  +5    -9    src/server/drmleasedevice_v1_interface.cpp
M  +1    -1    src/server/drmleasedevice_v1_interface_p.h

https://invent.kde.org/plasma/kwayland-server/commit/cdc9dcfb0a1f0445c46ee2cefa114329a2408555
Comment 15 Weng Xuetian 2022-04-16 20:18:40 UTC
Git commit dc09ce85f00b3a790e2817888067c3826280dd8e by Weng Xuetian.
Committed on 16/04/2022 at 20:18.
Pushed by xuetianweng into branch 'Plasma/5.24'.

Fix race in wp_drm_lease_v1.

Basically this is a well known issue in wayland for globals. If bind
comes after destroyed, it will raise a invalid global error. The common
practice is to delay the destroy of global. Similar technique is also
applied to wl_output.
(cherry picked from commit cdc9dcfb0a1f0445c46ee2cefa114329a2408555)

M  +5    -9    src/server/drmleasedevice_v1_interface.cpp
M  +1    -1    src/server/drmleasedevice_v1_interface_p.h

https://invent.kde.org/plasma/kwayland-server/commit/dc09ce85f00b3a790e2817888067c3826280dd8e
Comment 16 Nate Graham 2022-04-19 16:44:07 UTC
The KDE side is fixed now, and it looks like the XWayland fix is accepted and should be merged soon.