Bug 471517 - Whole-screen diagonal-stripes glitch on external display with hybrid graphics
Summary: Whole-screen diagonal-stripes glitch on external display with hybrid graphics
Status: RESOLVED DOWNSTREAM
Alias: None
Product: kwin
Classification: Plasma
Component: wayland-generic (show other bugs)
Version: 5.27.6
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-27 23:01 UTC by Ivan D Vasin
Modified: 2023-08-16 22:29 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
photo of the distorted screen (471.88 KB, image/jpeg)
2023-06-27 23:01 UTC, Ivan D Vasin
Details
drm_info -j (215.81 KB, application/json)
2023-06-27 23:03 UTC, Ivan D Vasin
Details
kscreen-console bug (26.80 KB, text/plain)
2023-06-27 23:03 UTC, Ivan D Vasin
Details
KWin Support Information (7.02 KB, text/plain)
2023-06-27 23:04 UTC, Ivan D Vasin
Details
dmesg-drm-debug.log (1020.00 KB, text/plain)
2023-06-27 23:05 UTC, Ivan D Vasin
Details
kwin-drm-debug.log (8.24 KB, text/plain)
2023-06-27 23:06 UTC, Ivan D Vasin
Details
kscreen.log (1.95 KB, text/x-log)
2023-06-27 23:07 UTC, Ivan D Vasin
Details
KScreen output 2627fe60f26afd08b11e318517ade0ae (378 bytes, text/plain)
2023-06-27 23:08 UTC, Ivan D Vasin
Details
KScreen output 2f4e4ef6ae9112f2683b157615664340 (400 bytes, text/plain)
2023-06-27 23:08 UTC, Ivan D Vasin
Details
edid-decode < /sys/class/drm/card1-DP-2/edid (3.92 KB, text/plain)
2023-06-27 23:20 UTC, Ivan D Vasin
Details
log what stride is used (729 bytes, text/plain)
2023-08-10 12:37 UTC, Zamundaaa
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ivan D Vasin 2023-06-27 23:01:09 UTC
Created attachment 159935 [details]
photo of the distorted screen

SUMMARY
With the default configuration running on Wayland, the entire screen's image on my external display is distorted in a way that makes that screen totally unusable and has an appearance of diagonal stripes from the top right to the bottom left.  Looking closely at the distorted image and moving the mouse cursor around on that screen, it seems that each subsequent row of pixels is effectively being shifted to the left by some number of pixels.  That is to say, perhaps all of the screen's pixels are being rendered, but the rows are all so misaligned as to be incoherent.  Screenshotting with Spectacle doesn't capture the distortion.  A photo of the distorted screen is attached.

This system has NVIDIA Optimus hybrid graphics with Intel UHD Graphics 630 as the integrated GPU (DRI card0) and NVIDIA GeForce GTX 1070 Mobile as the discrete GPU (DRI card1).  The internal display is wired via Embedded DisplayPort to the iGPU; there are Mini DisplayPort and HDMI ports that are both wired to the dGPU.  The internal display (which renders as expected) is an AU Optronics panel connected via eDP and has a native resolution of 3840×2160.  The external display (affected by this issue) is an ASUS ROG PG348Q connected via mDP and has a native resolution of 3440×1440.

The issue manifests identically under all these conditions:
• the display is connected via mDP, HDMI, or an mDP-to-HDMI adapter
• the display refresh rate is set to any value between 50 and 100 Hz via KScreen
• the two screens are rearranged in any manner via KScreen
• the scaling factor for either screen is adjusted to any value via KScreen
• either screen is chosen as the primary display via KScreen
• the EDID for both screens is retrieved via Windows 11 and applied during initramfs startup via the drm.edid_firmware kernel parameter

The issue goes away if I set the external display's resolution to anything below 3440×1440 via KScreen.  Strangely, KScreen only offers 1024×768, 800×600, and 640×480—the same resolutions as are listed by /sys/class/drm/card1-DP-2/modes—whereas in actuality the display supports many other resolutions, as one would expect, and which xrandr sees just fine.  Even more oddly, KScreen offers all the expected resolutions for my internal display, even though /sys/class/drm/card0-eDP-1/modes only lists 3840×2160.

Setting KWIN_DRM_DEVICES=/dev/dri/card1:/dev/dri/card0 resolves the issue completely.  That is, I have to configure KWin to prioritize my dGPU (NVIDIA) first and my iGPU (Intel) second.  I do this by putting the following line in /etc/environment:

export KWIN_DRM_DEVICES=/dev/dri/card1:/dev/dri/card0

Doing this in ~/.config/plasma-workspace/env/kwin.sh instead also works to fix the Plasma session, but I prefer to put it in /etc/environment so that SDDM also picks it up.  I've configured SDDM to run on Wayland using KWin as its compositor; in this configuration, SDDM suffers the same glitch in the absence of the above setting in /etc/environment.

KWin's default behavior seems to be equivalent to KWIN_DRM_DEVICES=/dev/dri/card0:/dev/dri/card1 (iGPU first, dGPU second).  Setting KWIN_DRM_DEVICES=/dev/dri/card0 causes only the internal display to be operational (correctly).  Setting KWIN_DRM_DEVICES=/dev/dri/card1 causes only the external display to be operational (correctly).

The issue is not reproducible using Hyprland or Mutter with their default configurations.


STEPS TO REPRODUCE
1. Launch a Plasma Wayland session with the external display connected.

OBSERVED RESULT
The entire screen on the external display is severely distorted.  The screen on the internal display is rendered correctly.

EXPECTED RESULT
Both screens are rendered correctly.


SOFTWARE/OS VERSIONS
Operating System: CachyOS Linux
KDE Plasma Version: 5.27.6
KDE Frameworks Version: 5.107.0
Qt Version: 5.15.10
Kernel Version: 6.3.9-1-cachyos (64-bit)
Graphics Platform: Wayland
Processors: 12 × Intel® Core™ i7-8750H CPU @ 2.20GHz
Memory: 31.2 GiB of RAM
Graphics Processor: NVIDIA GeForce GTX 1070 with Max-Q Design/PCIe/SSE2
Manufacturer: GIGABYTE
Product Name: AERO 15XV8


ADDITIONAL INFORMATION

$ inxi -Gazy
Graphics:
  Device-1: Intel CoffeeLake-H GT2 [UHD Graphics 630] vendor: Gigabyte
    driver: i915 v: kernel arch: Gen-9.5 process: Intel 14nm built: 2016-20
    ports: active: eDP-1 empty: DP-1 bus-ID: 00:02.0 chip-ID: 8086:3e9b
    class-ID: 0300
  Device-2: NVIDIA GP104M [GeForce GTX 1070 Mobile] vendor: Gigabyte
    driver: nvidia v: 535.54.03 alternate: nouveau,nvidia_drm non-free: 530.xx+
    status: current (as of 2023-05) arch: Pascal code: GP10x process: TSMC 16nm
    built: 2016-21 pcie: gen: 1 speed: 2.5 GT/s lanes: 16 link-max: gen: 3
    speed: 8 GT/s ports: active: none off: DP-2 empty: HDMI-A-1 bus-ID: 01:00.0
    chip-ID: 10de:1ba1 class-ID: 0300
  Device-3: Sunplus Innovation HD WebCam driver: uvcvideo type: USB rev: 2.0
    speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 1-9:5 chip-ID: 1bcf:2c6b
    class-ID: 0e02
  Display: wayland server: X.org v: 1.21.1.8 with: Xwayland v: 23.1.2
    compositor: kwin_wayland driver: X: loaded: modesetting,nvidia
    alternate: fbdev,intel,nouveau,nv,vesa dri: iris gpu: i915,nvidia
    d-rect: 4459x2112 display-ID: 0
  Monitor-1: DP-2 pos: top-right res: 2752x1152 size: N/A modes: N/A
  Monitor-2: eDP-1 pos: bottom-l res: 1707x960 size: N/A modes: N/A
  API: OpenGL v: 4.6 Mesa 23.1.3 renderer: Mesa Intel UHD Graphics 630 (CFL
    GT2) direct-render: Yes

$ kscreen-doctor --outputs
Output: 1 eDP-1 enabled connected priority 1 Panel Modes: 0:3840x2160@60*! 1:1600x1200@60 2:1280x1024@60 3:1024x768@60 4:2560x1600@60 5:1920x1200@60 6:1280x800@60 7:3840x2160@60 8:3200x1800@60 9:2880x1620@60 10:2560x1440@60 11:1920x1080@60 12:1600x900@60 13:1368x768@60 14:1280x720@60 Geometry: 0,192 1707x960 Scale: 2.25 Rotation: 1 Overscan: 0 Vrr: incapable RgbRange: Full
Output: 2 DP-2 enabled connected priority 2 DisplayPort Modes: 0:3440x1440@60! 1:3440x1440@100* 2:3440x1440@95 3:3440x1440@90 4:3440x1440@85 5:3440x1440@80 6:3440x1440@50 7:1024x768@60 8:800x600@60 9:640x480@60 Geometry: 1707,0 2752x1152 Scale: 1.25 Rotation: 1 Overscan: 0 Vrr: incapable RgbRange: unknown
Comment 1 Ivan D Vasin 2023-06-27 23:03:10 UTC
Created attachment 159936 [details]
drm_info -j
Comment 2 Ivan D Vasin 2023-06-27 23:03:54 UTC
Created attachment 159937 [details]
kscreen-console bug
Comment 3 Ivan D Vasin 2023-06-27 23:04:37 UTC
Created attachment 159938 [details]
KWin Support Information
Comment 4 Ivan D Vasin 2023-06-27 23:05:48 UTC
Created attachment 159939 [details]
dmesg-drm-debug.log
Comment 5 Ivan D Vasin 2023-06-27 23:06:17 UTC
Created attachment 159940 [details]
kwin-drm-debug.log
Comment 6 Ivan D Vasin 2023-06-27 23:07:17 UTC
Created attachment 159941 [details]
kscreen.log
Comment 7 Ivan D Vasin 2023-06-27 23:08:14 UTC
Created attachment 159942 [details]
KScreen output 2627fe60f26afd08b11e318517ade0ae
Comment 8 Ivan D Vasin 2023-06-27 23:08:29 UTC
Created attachment 159943 [details]
KScreen output 2f4e4ef6ae9112f2683b157615664340
Comment 9 Ivan D Vasin 2023-06-27 23:20:30 UTC
Created attachment 159944 [details]
edid-decode < /sys/class/drm/card1-DP-2/edid
Comment 10 Ivan D Vasin 2023-06-28 00:42:27 UTC
To be clear: this issue appears to be specific to KWin on Wayland.  It does not occur with other Wayland compositors (at least, not with Hyprland or Mutter in their default configurations), nor with KWin on X11.
Comment 11 Bug Janitor Service 2023-06-28 13:39:31 UTC
A possibly relevant merge request was started @ https://invent.kde.org/plasma/kwin/-/merge_requests/4219
Comment 12 Zamundaaa 2023-07-04 17:42:03 UTC
Git commit e698cafa2737904255d47d1e1710ee857cde9afd by Xaver Hugl.
Committed on 04/07/2023 at 15:33.
Pushed by zamundaaa into branch 'master'.

backends/drm: handle mismatching stride with CPU copying

M  +12   -12   src/backends/drm/drm_egl_layer_surface.cpp

https://invent.kde.org/plasma/kwin/-/commit/e698cafa2737904255d47d1e1710ee857cde9afd
Comment 13 Zamundaaa 2023-07-05 09:21:10 UTC
Git commit f1f7e2697d1ac4ebe9099006775717e5fd6f5777 by Xaver Hugl.
Committed on 05/07/2023 at 09:11.
Pushed by zamundaaa into branch 'Plasma/5.27'.

backends/drm: handle mismatching stride with CPU copying

M  +7    -6    src/backends/drm/drm_buffer_gbm.cpp
M  +7    -1    src/backends/drm/drm_buffer_gbm.h
M  +12   -8    src/backends/drm/drm_egl_layer_surface.cpp

https://invent.kde.org/plasma/kwin/-/commit/f1f7e2697d1ac4ebe9099006775717e5fd6f5777
Comment 14 Ivan D Vasin 2023-08-07 01:35:07 UTC
Unfortunately, the issue persists even after updating to Plasma 5.27.7.  Unsetting KWIN_DRM_DEVICES brings back the same distortion as before.  Are there any other diagnostics I can provide that might help pin this down?  I also have the KDE/Plasma 5 sources set up for building.  Happy to try out patches without waiting for a release.
Comment 15 Zamundaaa 2023-08-10 12:37:57 UTC
Created attachment 160889 [details]
log what stride is used

I can't reproduce the problem myself; with 5.27.6 I see the problem with the 1680x1050 resolution, but with 5.27.7 it's gone.

Could you please run KWin with the attached patch, and see what strides the buffers have?
Comment 16 Stefan Springer 2023-08-13 16:43:17 UTC
That f1f7e2697d1ac4ebe9099006775717e5fd6f5777 commit totally wrecks VRR, especially when using DisplayPort in my case. Selectively reverting it fixes the issue with HDMI (which was less affected) and DP.

My screen will flicker whenever the refresh rate goes below a certain threshold (~78fps for me w/ Gsync pendulum and MangoHud enforcing VRR). It will also do so seemingly random when operating the computer and occasionally produces a glitched bar on the botton of the screen, showing elements that should be on the top (i.e like my top panel).

Just guessing here, but could it be that some timing or buffer handling is off here or that this was data already belonging to the next frame placed at some weird offset?

!Anyhow, I'd recommend reverting f1f7e2697d1ac upstream ASAP, since for anyone affected, their GUI becomes practically unusable and the flickering may cause seizures in people with epilepsy. Some may also think that they got HW damage… Thank you!

Plasma 5.27.7, Gentoo, tried Kernel 6.1.45-lts, 6.3.13 and 6.4.10.
Radeon 6800XT, Mesa 23.1.5, libdrm 2.4.115
AOC Q24G2 1440p@165Hz (144Hz on HDMI, which somehow made the flickers happen less often).
Comment 17 Stefan Springer 2023-08-13 16:47:03 UTC
(In reply to Stefan Springer from comment #16)

Just a small addition. KWin's VRR setting (Auto/Disabled/Always) didn't matter particularly much, with Disabled reducing the likelihood of occurrence the most. What could workaround the problem entirely without reverting the commit, was using the monitor's OSD to completely disable VRR support (it's called "G-Sync compatible" there).
Comment 18 Zamundaaa 2023-08-16 13:15:06 UTC
(In reply to Stefan Springer from comment #16)
> My screen will flicker whenever the refresh rate goes below a certain
> threshold (~78fps for me w/ Gsync pendulum and MangoHud enforcing VRR). It
> will also do so seemingly random when operating the computer and
> occasionally produces a glitched bar on the botton of the screen, showing
> elements that should be on the top (i.e like my top panel).

Did you actually revert the commit and test that, and what's your second GPU? Because if you do have mismatching stride, then your output would've been completely unusable before the commit. And if you don't, then the commit doesn't change anything.

(In reply to Ivan D Vasin from comment #14)
> Unfortunately, the issue persists even after updating to Plasma 5.27.7. 
> Unsetting KWIN_DRM_DEVICES brings back the same distortion as before.  Are
> there any other diagnostics I can provide that might help pin this down?  I
> also have the KDE/Plasma 5 sources set up for building.  Happy to try out
> patches without waiting for a release.

ping on the patch. I can't really do anything without narrowing the problem down.
Comment 19 Stefan Springer 2023-08-16 22:05:02 UTC
(In reply to Zamundaaa from comment #18)
> Did you actually revert the commit and test that, and what's your second
> GPU? Because if you do have mismatching stride, then your output would've
> been completely unusable before the commit. And if you don't, then the
> commit doesn't change anything.

Long story short: I can now say with certainty that my specific issue is not KDE related. Sorry for making such a fuzz!

So after reverting the patch, things seemed fine for two days and a reboot, but then the issue got triggered again out of seemingly nowhere. Following some more digging around, and even trying an old 5.19 LTS kernel, I decided to create a new, blank user account to see if it also exhibits the issue. Surprisingly, it wasn't present on a fresh acc. with default settings. I step-wise reconstructed the settings of my main account to see if it reoccurred.

The trigger was touching mclk_od on 6800XT (and probably also RX 470 when thinking about the weird behavior on my other PC; for some reason [HBM?] Vega 56 is fine), even if it's just a change of 5Mhz up or down, it will reliably trigger the problem. I manually played around with the pp_od_clk_voltage sysfs interface to confirm this, and ultimately disabled the part of my OC script that adjusts mclk; now everything works. Multiple reboots, logging off and on again, the gsync pendulum at any arbitrary refresh rate, etc…

I was using a dual-monitor 144Hz config before, then, after moving, single monitor with 144Hz over HDMI because I grabbed the wrong cable, and only when I finally got around to getting a DP cable (3 in fact, since it looked like I got two defective ones at first), which allowed me to drive my monitor at 165Hz (still single-monitor use now), the issue was unmasked. That's what originally had me sifting through a month worth of system updates and commits.

Currently, when looking at the AMDGPU issue tracker, there is fundamental restructuring going on in DC (Display Core) code, with issues around mclk handling being present (It actually broke badly in Kernel 6.4, such that desktop RDNA2 simply won't increase mclk at all, for some OOTB, for others with specific monitors/resolutions/refresh rates and for others once touching mclk_od), aswell as problems with calculating the required bandwidth for a given monitor config (res, hz, multi-display, VRR), so this seems related to that.
Comment 20 Nate Graham 2023-08-16 22:29:28 UTC
Thanks, the fact that you're overclocking your GPU is highly relevant here. :)