Bug 478921

Summary: Black/Blurred external screen on Plasma/KWin 6.0 Beta 1 + Wayland + AMD iGPU + eGPU
Product: [Plasma] kwin Reporter: fxzjshm <fxzjshm>
Component: wayland-genericAssignee: KWin default assignee <kwin-bugs-null>
Status: RESOLVED FIXED    
Severity: normal CC: nate, xaver.hugl
Priority: NOR Keywords: multiscreen, qt6
Version First Reported In: git master   
Target Milestone: ---   
Platform: Ubuntu   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:

Description fxzjshm 2023-12-23 08:52:28 UTC
(Originally posted at https://discuss.kde.org/t/black-blurred-screen-on-plasma-kwin-6-0-beta-1-wayland-amd-egpu/8947 )


SUMMARY

Running KDE 6.0 Beta 1 recently on an AMD laptop, on Wayland session the screen attached to external GPU cannot display properly on default, it cannot show any part of a window if I drag one to it from the internal display (internal display works normally), only random blurred/fuzzy screen or black screen.


STEPS TO REPRODUCE

1. Compile, install KDE Plasma 6.0 Beta 1
2. Run wayland session of it, with external screens attached to eGPU


OBSERVED RESULT

* random blurred/fuzzy screen or black screen on external monitor
* internal display works normally


EXPECTED RESULT

* all screens work


SOFTWARE/OS VERSIONS

Linux/KDE Plasma: Ubuntu 22.04
KDE Plasma Version: 5.90.90
KDE Frameworks Version: 5.247.0
Qt Version: 6.6.2


ADDITIONAL INFORMATION

GPU setup:
```
➜  ~ lspci | grep VGA
37:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 [Radeon VII] (rev c1)
63:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt (rev d8)
```

The eGPU Radeon VII is `/dev/dri/card0` and the iGPU Radeon 680M (of a Ryzen 6800HS) is `/dev/dri/card1`.

Driver is the one in upstream kernel, tried kernel version `6.6.7-1-liquorix-amd64` and `6.5.0-1010-oem` (of Ubuntu 22.04)

Other behaviors noticed:

* Mouse cursor can be displayed properly on the external screen (which seems strange?)
* Spectacle can capture all desktops correctly.
* No kernel/driver panic or warning in kernel message
* this phenomenon does not appear on Xorg session
* this phenomenon does not appear on KDE Plasma/KWin 5.27.x
* In accordance with this post ( [Plasma breaks on Wayland with NVIDIA proprietary drivers eGPU](https://discuss.kde.org/t/plasma-breaks-on-wayland-with-nvidia-proprietary-drivers-egpu/8749) ), the default setup seems equivalent to `export KWIN_DRM_DEVICES=/dev/dri/card1:/dev/dri/card0`, and swapping them fixes it.

Current workaround: Use eGPU to render: add

```
export KWIN_DRM_DEVICES=/dev/dri/card0:/dev/dri/card1
```

to `$KDE_PREFIX/usr/lib/x86_64-linux-gnu/libexec/plasma-dev-prefix.sh`.
Comment 1 Bug Janitor Service 2024-01-11 18:28:54 UTC
A possibly relevant merge request was started @ https://invent.kde.org/plasma/kwin/-/merge_requests/4886
Comment 2 Zamundaaa 2024-01-11 20:36:07 UTC
Git commit 451b878bb93c96acd6af728b61de5f26bbd51392 by Xaver Hugl.
Committed on 11/01/2024 at 20:07.
Pushed by zamundaaa into branch 'master'.

backends/drm: don't allow implicit modifiers for multi gpu transfers

As we translate DRM_FORMAT_MOD_LINEAR to implicit modifiers + linear flag, the
egl import path should still work without implicit modifiers too.

M  +2    -0    src/backends/drm/drm_egl_layer_surface.cpp

https://invent.kde.org/plasma/kwin/-/commit/451b878bb93c96acd6af728b61de5f26bbd51392
Comment 3 Vlad Zahorodnii 2024-01-12 12:25:28 UTC
Git commit d1d1b52bed8d02dd314e0fad6fe54986bd57e8a9 by Vlad Zahorodnii, on behalf of Xaver Hugl.
Committed on 12/01/2024 at 13:17.
Pushed by vladz into branch 'Plasma/6.0'.

backends/drm: don't allow implicit modifiers for multi gpu transfers

As we translate DRM_FORMAT_MOD_LINEAR to implicit modifiers + linear flag, the
egl import path should still work without implicit modifiers too.


(cherry picked from commit 451b878bb93c96acd6af728b61de5f26bbd51392)

M  +2    -0    src/backends/drm/drm_egl_layer_surface.cpp

https://invent.kde.org/plasma/kwin/-/commit/d1d1b52bed8d02dd314e0fad6fe54986bd57e8a9
Comment 4 fxzjshm 2024-01-23 13:55:19 UTC
Sorry for the delay, I was outside for work so was unable to test this change.
However unfortunately commit d1d1b52 ("backends/drm: don't allow implicit modifiers for multi gpu transfers") seems does not change the behavior.

After some debugging, it turns out that need to explicitly disable DRM modifier (`export KWIN_DRM_USE_MODIFIERS=0`). Not sure why that commit doesn't work.
Comment 5 fxzjshm 2024-01-27 14:13:14 UTC
Further investigation shows that, when rendering using iGPU, KWin keeps trying to re-create (`EglGbmLayerSurface::createSurface(size, formats)`) because `EglGbmLayerSurface::checkSurface(size, formats)` always fails.

The cause of check failure is in 
```c++
std::unique_ptr<EglGbmLayerSurface::Surface> EglGbmLayerSurface::createSurface(const QSize &size, uint32_t format, const QList<uint64_t> &modifiers, MultiGpuImportMode importMode, BufferTarget bufferTarget) const;
```
the `ret->gbmSwapchain` is created using local variable `renderModifiers` which is derived from `importDrmFormat.allModifiers`, and `renderModifiers` is not directly related to the function parameter `modifiers`.
But `EglGbmLayerSurface::checkSurface` uses formats same as the function parameter `modifiers`, and these two may not somewhat "compatible".

In my case, `renderModifiers:  QList(144115188075858177, 0)`, `144115188075858177` is chosen when creating `gbmSwapchain`, 
but `modifiers:  QList(144115188151360001, 144115188075858433, 0)`, 
so `checkSurface` always fails.

I suggest the intersection of `renderModifiers` and `modifiers` should be used to create `gbmSwapchain`, i.e.
```diff
  renderModifiers = filterModifiers(importDrmFormat.allModifiers,
                                    drmFormat.nonExternalOnlyModifiers);
+ // use the intersection, or checkSurface will fail
+ renderModifiers = filterModifiers(renderModifiers, modifiers);
  // transferring non-linear buffers with implicit modifiers between GPUs is likely to yield wrong results
  renderModifiers.removeAll(DRM_FORMAT_MOD_INVALID);
```
I have only one multi-GPU setup so don't know if it will break other setups.
Does this fix look reasonable? If so I think a MR should be made.
Comment 6 Bug Janitor Service 2024-02-06 13:42:16 UTC
A possibly relevant merge request was started @ https://invent.kde.org/plasma/kwin/-/merge_requests/5124
Comment 7 Zamundaaa 2024-02-06 13:43:25 UTC
I tested that MR with a hack in KWin to pretend the secondary GPU doesn't support modifiers, so it *should* work. As there's still subtle differences with a GPU that actually doesn't support them, it would be good if you could test it to make sure it fixes the problem
Comment 8 fxzjshm 2024-02-06 13:55:21 UTC
OK, will test it once I get back. Thanks in advance.
Comment 9 Zamundaaa 2024-02-07 14:47:52 UTC
Git commit d3a2e070024fe2621619576f83ea471f3d7fb1bb by Xaver Hugl.
Committed on 07/02/2024 at 14:35.
Pushed by zamundaaa into branch 'master'.

backends/drm: fix multi gpu transfers with mixed modifiers and implicit modifiers usage

M  +6    -6    src/backends/drm/drm_egl_layer_surface.cpp
M  +1    -2    src/backends/drm/drm_egl_layer_surface.h

https://invent.kde.org/plasma/kwin/-/commit/d3a2e070024fe2621619576f83ea471f3d7fb1bb
Comment 10 Zamundaaa 2024-02-07 14:48:00 UTC
Git commit 1583b2c717100dad87c95e1b9db607550ec5af33 by Xaver Hugl.
Committed on 07/02/2024 at 14:35.
Pushed by zamundaaa into branch 'master'.

backends/drm: fix EglGbmLayerSurface::doesSurfaceFit with multi gpu

M  +16   -6    src/backends/drm/drm_egl_layer_surface.cpp

https://invent.kde.org/plasma/kwin/-/commit/1583b2c717100dad87c95e1b9db607550ec5af33
Comment 11 Vlad Zahorodnii 2024-02-07 15:30:08 UTC
Git commit e4244c8056f563b4696c385927c30c890514b9ce by Vlad Zahorodnii, on behalf of Xaver Hugl.
Committed on 07/02/2024 at 15:21.
Pushed by vladz into branch 'Plasma/6.0'.

backends/drm: fix EglGbmLayerSurface::doesSurfaceFit with multi gpu
(cherry picked from commit 1583b2c717100dad87c95e1b9db607550ec5af33)

M  +16   -6    src/backends/drm/drm_egl_layer_surface.cpp

https://invent.kde.org/plasma/kwin/-/commit/e4244c8056f563b4696c385927c30c890514b9ce
Comment 12 Vlad Zahorodnii 2024-02-07 15:30:16 UTC
Git commit 590055e8ff2bb216f57b698e14397c63d1d8bc4e by Vlad Zahorodnii, on behalf of Xaver Hugl.
Committed on 07/02/2024 at 15:21.
Pushed by vladz into branch 'Plasma/6.0'.

backends/drm: fix multi gpu transfers with mixed modifiers and implicit modifiers usage
(cherry picked from commit d3a2e070024fe2621619576f83ea471f3d7fb1bb)

M  +6    -6    src/backends/drm/drm_egl_layer_surface.cpp
M  +1    -2    src/backends/drm/drm_egl_layer_surface.h

https://invent.kde.org/plasma/kwin/-/commit/590055e8ff2bb216f57b698e14397c63d1d8bc4e
Comment 13 fxzjshm 2024-02-22 12:22:25 UTC
Confirm this bug has been fixed. Many thanks to you all!