Bug 433107 - Wayland: iGPU/AMDGPU multi-monitor keeps displaying the SDDM screen if iGPU-DisplayPort is connected
Summary: Wayland: iGPU/AMDGPU multi-monitor keeps displaying the SDDM screen if iGPU-D...
Status: RESOLVED FIXED
Alias: None
Product: kwin
Classification: Plasma
Component: wayland-generic (show other bugs)
Version: 5.21.0
Platform: Gentoo Packages Linux
: NOR normal
Target Milestone: ---
Assignee: Zamundaaa
URL:
Keywords: wayland
Depends on:
Blocks:
 
Reported: 2021-02-17 19:34 UTC by Andrew Nowa Ammerlaan
Modified: 2022-08-13 18:44 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In: 5.23
Sentry Crash Report:


Attachments
wayland session log (69.44 KB, text/x-log)
2021-02-17 19:34 UTC, Andrew Nowa Ammerlaan
Details
wayland session log with env variables (94.96 KB, application/gzip)
2021-02-18 14:18 UTC, Andrew Nowa Ammerlaan
Details
wayland log with the patch (20.33 KB, text/x-log)
2021-04-11 08:38 UTC, Andrew Nowa Ammerlaan
Details
SIGSEGV backtrace kwin_wayland (7.02 KB, text/plain)
2021-04-11 18:43 UTC, Andrew Nowa Ammerlaan
Details
wayland session log (patch applied) (470.39 KB, text/x-log)
2021-04-11 18:44 UTC, Andrew Nowa Ammerlaan
Details
wayland session log (patch applied) (2.49 MB, text/x-log)
2021-04-11 20:31 UTC, Andrew Nowa Ammerlaan
Details
wayland session log (patch applied) (330.94 KB, text/plain)
2021-04-12 15:25 UTC, Andrew Nowa Ammerlaan
Details
wayland-session-log (485.07 KB, text/plain)
2021-07-12 19:34 UTC, Andrew Nowa Ammerlaan
Details
kwin_wayland backtrace (5.24 KB, text/plain)
2021-07-13 10:32 UTC, Andrew Nowa Ammerlaan
Details
wayland-session-log (566.08 KB, text/plain)
2021-07-15 13:50 UTC, Andrew Nowa Ammerlaan
Details
wayland session log before applying patch (169.40 KB, text/plain)
2021-07-19 16:10 UTC, Andrew Nowa Ammerlaan
Details
wayland session log after applying the patch (254.46 KB, application/x-tar)
2021-07-19 16:12 UTC, Andrew Nowa Ammerlaan
Details
wayland-session-log (patch applied) (830.40 KB, text/plain)
2021-07-20 14:54 UTC, Andrew Nowa Ammerlaan
Details
wayland-session-log-kwin-5.22.5 (557.16 KB, text/x-log)
2021-09-13 18:06 UTC, Andrew Nowa Ammerlaan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Nowa Ammerlaan 2021-02-17 19:34:01 UTC
Created attachment 135787 [details]
wayland session log

SUMMARY

After upgrading to 5.21.0 monitors connected to the iGPU are detected (yay) (bug 417323), but all they display is the SDDM screen which stayed there after login. The cursor is visible, I can move windows onto those monitors, but the SDDM screen still gets displayed over them so they are not visible. 

Rearranging the monitors works, but still the SDDM screen is displayed on top of everything else (including the taskbar etc). Trying to change the resolution makes *everything* crash (kwin, plasmashell, xwayland, etc.) without re-spawning.

STEPS TO REPRODUCE
1. Start wayland session with monitor connected to one of the iGPU's ports

OBSERVED RESULT

The SDDM screen stays visible on the monitors connected to the iGPU, monitors connected to the AMDGPU work fine. I cannot interact with this screen, nor can I see anything that is put below it.

EXPECTED RESULT

The monitors connected to the iGPU work the same as those connected to the AMDGPU. The SDDM screen is replaced by the splash screen and eventually the desktop.

SOFTWARE/OS VERSIONS
Operating System: Gentoo Linux
KDE Plasma Version: 5.21.0
KDE Frameworks Version: 5.79.0
Qt Version: 5.15.2
Kernel Version: 5.11.0-gentoo
OS Type: 64-bit
Graphics Platform: X11
Processors: 12 × Intel® Core™ i7-8700K CPU @ 3.70GHz
Memory: 31.2 GiB of RAM
Graphics Processor: Radeon RX 590 Series

ADDITIONAL INFORMATION

~/.local/share/sddm/wayland-session.log attached as requested.

How do I obtain the backtrace of kwin_wayland?
Comment 1 Zamundaaa 2021-02-17 21:28:09 UTC
Okay, that is weird:
> Kwin exited with code 0

So you're probably not gonna be able to get a backtrace... KWin doesn't crash. What does crash is "plasma_session" though; if you have it installed with debug symbols then you should be able to use "coredumpctl debug plasma_session" and show the backtrace with "bt" 

Could you try the session again with
QT_LOGGING_RULES="kwin_*.debug=true;kwin_libinput.debug=false"
as an environment variable? That could give more information on what's happening, at least what's happening with the outputs.
Comment 2 Andrew Nowa Ammerlaan 2021-02-18 14:15:58 UTC
I re-tested with the environment variables you suggested, and also recompiled, kwin, kscreen and libinput with the debug symbols.

This time it did not crash when trying to change the resolution. However, there was no visible effect of changing the resolution on the monitors connected to the iGPU. It did crash when I tried to enable a monitor connected to the iGPU that I had previously disabled. However, this time it did respawn.

Something I noticed now, that I did not notice last time, is that the monitor configuration is not what the kscreen configuration window says it is. I set it to:

              ________________
             [                ]
             [                ]
             [     AMDGPU2    ]
             [                ]
             [________________]

____________ _________________  ___________
            ][                ][           ]
            ][                ][           ]
   iGPU1    ][      AMDGPU1   ][    iGPU2  ]
            ][                ][           ]
____________][________________][___________]

              ________________
             [                ]
             [                ]
             [     iGPU3      ]
             [                ]
             [________________]

But got instead:

              ________________
             [                ]
             [                ]
             [     AMDGPU2    ]
             [                ]
             [________________]

____________ _________________  ___________
            ][                ][           ]
            ][                ][           ]
   iGPU3    ][      AMDGPU1   ][    iGPU1  ]
            ][                ][           ]
____________][________________][___________]

              ________________
             [                ]
             [                ]
             [     iGPU1      ]
             [                ]
             [________________]

Disabling a monitor connected to the iGPU has no effect on what is shown on the monitor, it does however prevent the mouse from moving onto that monitor. However the monitor that is set as disabled is not the monitor onto which the mouse can no longer move. E.g. disabling iGPU1, prevents the mouse from moving onto iGPU2. However, the mouse can actually still move onto iGPU1.

To me it looks like the monitors are somehow mixed up, the information that is shown about each monitor is correct (e.g. name, resolution, refresh rate). However, applying operations, such as disabling or moving, to that monitor in the configuration window will actually apply these operations to a different monitor.

I will attach the new log, something that caught my eye is this:
kwin_wayland_drm: Atomic request failed to commit:  Invalid argument
and this:
kwin_wayland_drm: Atomic request failed to commit:  Permission denied
The log is littered with these.
Comment 3 Andrew Nowa Ammerlaan 2021-02-18 14:18:31 UTC
Created attachment 135852 [details]
wayland session log with env variables
Comment 4 Andrew Nowa Ammerlaan 2021-02-18 14:23:28 UTC
Small correction, I got:               ________________
             [                ]
             [                ]
             [     AMDGPU2    ]
             [                ]
             [________________]

____________ _________________  ___________
            ][                ][           ]
            ][                ][           ]
   iGPU2    ][      AMDGPU1   ][    iGPU3  ]
            ][                ][           ]
____________][________________][___________]

              ________________
             [                ]
             [                ]
             [     iGPU1      ]
             [                ]
             [________________]

My previous sketch had iGPU1 twice, which is of course wrong.
Comment 5 Zamundaaa 2021-02-18 16:29:15 UTC
3 displays is the maximum a Intel iGPU can drive (at least for older ones than Tigerlake) and so it could be KWin a wrong combination of DRM objects that doesn't work; we don't handle that gracefully yet. Could you try what happens if you only plug in one monitor to the iGPU?
Comment 6 Andrew Nowa Ammerlaan 2021-02-18 17:32:15 UTC
 
By (un)plugging some of the monitors I found something interesting.

The problem is with HDMI-1-2(which contrary to what the name suggests is a DisplayPort), if this iGPU-port is connected I run into the problem I described.
- If this port is not connected everything works fine (2 remaining on the iGPU, 4 total).
- If I move the monitor that was connected to this port, to a port on the AMDGPU everything is also fine (same total number of monitors, 5)
- If I remove one of the other monitors connected to the iGPU, I still have the same problem.
- If I remove all monitors connected to the iGPU, but leave only this port connected, I still have the same problem.
- If I connect a different monitor to this port, I still have the same problem (therefore, it is not specific to that monitor, it is the port)

I don't know if this is related, but I've had other issues with this specific port as well. If this port is connected *and* the iGPU is the boot-GPU (no matter what other things are connected) the system POST is not displayed at all, and the BIOS and Grub are inaccessible. In this situation the monitors only start showing something once the linux kernel takes over the framebuffer.

And for some reason the naming of the ports on the iGPU is all messed up (and has been for as long as I had this computer), DisplayPort is named HDMI-1-2, VGA is named DP-1-5, DVI is named HDMI-1-3 but that is probably a kernel thing and unrelated.

I should add that this port is connected through a DisplayPort to HDMI adapter, since I do not have a DisplayPort capable monitor I cannot test without it. So whenever I say that this port 'is connected' it is implied that this is through this converter.

That being said, this port works just fine in X, so there is still some bug here.
Comment 7 Zamundaaa 2021-02-18 17:45:58 UTC
That does sound like it is indeed the problem of kwin_wayland not being smart enough when driving the iGPU. I've been working on some changes that should enable us to fix that but they won't go into 5.21.

The thing with the port having problems overall is interesting but most likely won't cause problems once we handle failures.
Comment 8 Bug Janitor Service 2021-04-10 14:09:42 UTC
A possibly relevant merge request was started @ https://invent.kde.org/plasma/kwin/-/merge_requests/844
Comment 9 Zamundaaa 2021-04-10 15:35:22 UTC
The linked merge request will most likely solve your problem but it would be great if you could test it
Comment 10 Andrew Nowa Ammerlaan 2021-04-11 08:37:37 UTC
To apply the patch successfully I had to use the latest version from git (for kwin and some other packages). The patch does not apply to the 5.21.4 release. 

After upgrading and applying the patch, a X11 session still works. However, now wayland does not work at all, it just flashes the screen a bit on the monitors connected to the AMDGPU and then crashes. The monitors on the iGPU continue to display the SDDM screen, now this happens irrespective of whether the problematic port is connected or not.

I am not sure if the problem is with the patch from the Merge Request, or with the upgrade to the latest version from git.

The log doesn't seem to show anything helpful, but I'll attach it anyway.
Comment 11 Andrew Nowa Ammerlaan 2021-04-11 08:38:18 UTC
Created attachment 137498 [details]
wayland log with the patch
Comment 12 Andrew Nowa Ammerlaan 2021-04-11 08:58:32 UTC
I just tested without the patch, and then it still crashes. So the problem does not seem to be with the patch necessarily but with some other change in the live git version. Perhaps I just got unlucky and synced it while it was in a broken state. 

I'm not sure how I could test this more properly, any suggestions?
Comment 13 Zamundaaa 2021-04-11 12:30:49 UTC
You can try using
> coredumpctl debug kwin_wayland
and then use "bt" in gdb to get the stacktrace of the last crash.
If you use the environment variable
> QT_LOGGING_RULES="kwin_*.debug=true;kwin_libinput.debug=false"
again then there could be something useful in the wayland-session.log, too.
Comment 14 Andrew Nowa Ammerlaan 2021-04-11 18:43:48 UTC
Created attachment 137511 [details]
SIGSEGV backtrace kwin_wayland
Comment 15 Andrew Nowa Ammerlaan 2021-04-11 18:44:22 UTC
Created attachment 137512 [details]
wayland session log (patch applied)

I hope this is useful
Comment 16 Zamundaaa 2021-04-11 19:03:07 UTC
Yes, very useful. https://invent.kde.org/plasma/kwin/-/merge_requests/847 should fix that
Comment 17 Andrew Nowa Ammerlaan 2021-04-11 20:31:22 UTC
Created attachment 137515 [details]
wayland session log (patch applied)

(In reply to Zamundaaa from comment #16)
> Yes, very useful. https://invent.kde.org/plasma/kwin/-/merge_requests/847
> should fix that

Awesome, this indeed fixed it :)

However, the original issue is not quite fixed yet (though there is definitely some progress with this patch). If I apply the patch the problem becomes sort of inverted. Now the monitors connected to the iGPU work just fine (even the problematic port!). However, the monitors connected to the AMDGPU turn off after logging in. The kscreen config window does detect and display them correctly and they are marked as enabled (though the monitor itself says "No Signal"). If I disable and enable those monitors in the kscreen config window, they do turn on but the desktop completely freezes shortly afterwards.

Without the patch the original behaviour described in the first comment is restored (AMDGPU monitors work, but iGPU monitors display SDDM screen).

I don't have a coredump this time because there is no crash, though I have attached the wayland session log.
Comment 18 Zamundaaa 2021-04-11 21:18:34 UTC
Hmm, the log says that the tests reported the outputs as working but presentation fails afterwards... I added a commit that might fix that.
Comment 19 Andrew Nowa Ammerlaan 2021-04-12 15:15:58 UTC
Good News!

You're patch works! I am writing this in a fully functioning wayland session. All monitors now work as expected.

There seems to be a small issue on shutdown which may or may not be related. Sometimes the monitors connected to the iGPU retain their contents after the session has quit and the shutdown process has started. Some services fail to stop (e.g. bluetooth), which indicates to me that the session might be (partially) re-spawning when it shouldn't. But this could very well be an unrelated issue.

Anyway with your patch my setup works with wayland! Looks like I will finally be able to join the rest of the world in the future that is wayland :D

Thanks!
Comment 20 Andrew Nowa Ammerlaan 2021-04-12 15:25:00 UTC
Created attachment 137532 [details]
wayland session log (patch applied)
Comment 21 Zamundaaa 2021-04-12 15:47:10 UTC
Cool. While the log unexpectedly still doesn't show any tests failing for the amd gpu I don't think that's something to worry about.

> Sometimes the monitors connected to the iGPU retain their contents after the session has quit and the shutdown process has started

That is a separate issue but should definitely be fixed as well. Properly blanking the monitors before exit shouldn't be hard to do.

The session not closing everything properly is probably https://bugs.kde.org/show_bug.cgi?id=433293
Comment 22 Zamundaaa 2021-07-10 21:52:10 UTC
Patches the merge request depend on have been merged now, and I updated the merge request to the new code. Could you test it again? It should in theory work the same but it would be good to make sure.
Comment 23 Andrew Nowa Ammerlaan 2021-07-12 19:34:52 UTC
Created attachment 140013 [details]
wayland-session-log

(In reply to Zamundaaa from comment #22)
> Patches the merge request depend on have been merged now, and I updated the
> merge request to the new code. Could you test it again? It should in theory
> work the same but it would be good to make sure.

After upgrading to the latest live version wayland is refusing to start (both with and without the patch from your Merge Request). The screen flashes fast and frequently, and eventually returns to SDDM (does not matter if the port that was problematic before is connected or not). Wayland session log is attached.

X does still work though.
Comment 24 Zamundaaa 2021-07-12 22:36:10 UTC
KWin crashes, apparently when starting to render. Can you provide a backtrace?
Comment 25 Andrew Nowa Ammerlaan 2021-07-13 10:32:00 UTC
Created attachment 140018 [details]
kwin_wayland backtrace

(In reply to Zamundaaa from comment #24)
> KWin crashes, apparently when starting to render. Can you provide a
> backtrace?

Here's the backtrace, I hope it is helpful
Comment 26 Zamundaaa 2021-07-15 11:43:15 UTC
Git commit afcef2a6f822c46d3167f4b49bb8a3b13696c8c8 by Xaver Hugl.
Committed on 15/07/2021 at 11:42.
Pushed by zamundaaa into branch 'master'.

platforms/drm: fix crash with secondary GPUs and buffer age

M  +2    -1    src/plugins/platforms/drm/abstract_egl_drm_backend.h
M  +24   -12   src/plugins/platforms/drm/egl_gbm_backend.cpp
M  +4    -3    src/plugins/platforms/drm/egl_gbm_backend.h
M  +1    -1    src/plugins/platforms/drm/egl_stream_backend.cpp

https://invent.kde.org/plasma/kwin/commit/afcef2a6f822c46d3167f4b49bb8a3b13696c8c8
Comment 27 Zamundaaa 2021-07-15 11:44:08 UTC
crash should be fixed (at least it works with vkms)
Comment 28 Andrew Nowa Ammerlaan 2021-07-15 13:50:41 UTC
Created attachment 140080 [details]
wayland-session-log

(In reply to Zamundaaa from comment #27)
> crash should be fixed (at least it works with vkms)

Yes the crash is fixed now, Thanks.

After applying the patch from your Merge Request I get a proper display on one of the monitors connected to the iGPU, the other two continue to display the SDDM screen. The monitor connected to the AMDGPU (DVI-D-1) turns black. I can move my mouse onto those monitors, and they are properly shown in Display Settings (correct configuration and correct resolution etc). I see some "Invalid argument" in the logs, so perhaps that is the cause.
Comment 29 Zamundaaa 2021-07-18 09:47:59 UTC
I rebased the MR to include a related bugfix from master, could you test again?

If it still fails in some way, could you also use the environment variable
QT_LOGGING_RULES="kwin_*.debug=true;kwin_libinput.debug=false"
again? On errors the output is rather verbose but debug output still helps
Comment 30 Andrew Nowa Ammerlaan 2021-07-19 16:09:48 UTC
We're getting close,

Without the patch I get a working display on the monitor connected to the AMDGPU, and on *one* of the monitors connected to the iGPU (the one that was problematic before, but that might be a coincidence).

With the patch I get a working display on *all* monitors connected to the iGPU. However, the monitor connected to the AMDGPU stays black. If I go to the Display Settings the monitor is there. If I disabled and enable this monitor, all of the displays hang/freeze but I do see the desktop pop up on that monitor before it freezes and the screen corrupts.
Comment 31 Andrew Nowa Ammerlaan 2021-07-19 16:10:11 UTC
Created attachment 140193 [details]
wayland session log before applying patch
Comment 32 Andrew Nowa Ammerlaan 2021-07-19 16:12:16 UTC
Created attachment 140194 [details]
wayland session log after applying the patch
Comment 33 Zamundaaa 2021-07-20 12:50:39 UTC
What seems to have caused the issue is that the display turns off because it doesn't receive new frames while KWin is initializing - I think it takes too long to render the first frame. Maybe we can push the old frame again or something like that...

There was however also a bug in KWins handling of taking control over the display, which made it not recover from that situation. I added a fix to the MR
Comment 34 Andrew Nowa Ammerlaan 2021-07-20 14:54:59 UTC
Created attachment 140219 [details]
wayland-session-log (patch applied)

Awesome, now it works!

The monitor connected to the AMDGPU does still go to black before eventually showing the splash screen. It stays black for about a second (the other monitors are already showing the splash screen at this stage).

Hot(un)plugging works. However, when hotunplugging a monitor all windows are moved to the monitor connected to the AMDGPU (which is marked as the "primary" monitor, i.e. monitor 1). And when hotplugging a monitor all windows are moved to the monitor that just connected. (But maybe this behaviour is intentional?)

There is also Bug 438508 which is possibly related since it only occurs when using iGPU multimonitor. If I recall correctly, it didn't happen when I tested yesterday, but it is happening now so perhaps it is maybe related to these changes?

Thank you for working on this!
Comment 35 Andrew Nowa Ammerlaan 2021-07-20 15:05:51 UTC
And one other small thing I noticed is that after hotunplugging and hotplugging one of the monitors the widgets and taskbar on that monitor are gone (i.a.w. it detects it as if it were a new monitor and it doesn't restore the settings)
Comment 36 Andrew Nowa Ammerlaan 2021-07-20 15:11:29 UTC
(In reply to Andrew Ammerlaan from comment #35)
> And one other small thing I noticed is that after hotunplugging and
> hotplugging one of the monitors the widgets and taskbar on that monitor are
> gone (i.a.w. it detects it as if it were a new monitor and it doesn't
> restore the settings)

After looking at the logs, it appears that it detects the hotplugged monitor (HDMI-A-2-HKC-TV) multiple times instead.
Comment 37 Zamundaaa 2021-07-20 15:16:01 UTC
Nice!

> Hot(un)plugging works. However, when hotunplugging a monitor all windows are moved to the monitor connected to the AMDGPU (which is marked as the "primary" monitor, i.e. monitor 1). And when hotplugging a monitor all windows are moved to the monitor that just connected.

Yes that is intentional. I think there is a MR that implements moving the windows back to where they were before hot-unplugging though.

> The monitor connected to the AMDGPU does still go to black before eventually showing the splash screen. It stays black for about a second (the other monitors are already showing the splash screen at this stage).

I'll see what I can do about that - the kernel likely expects that we immediately push a frame once we take over. Maybe we can push an empty frame (so that it shows the same image again) as opposed to a black image for example to not cause flickering.

> And one other small thing I noticed is that after hotunplugging and hotplugging one of the monitors the widgets and taskbar on that monitor are gone (i.a.w. it detects it as if it were a new monitor and it doesn't restore the settings)

That's unfortunately a known bug in plasmashell.
Comment 38 Zamundaaa 2021-07-25 21:09:20 UTC
Could you check if the monitor still turns off on login now? I added a commit that might resolve that
Comment 39 Andrew Nowa Ammerlaan 2021-07-26 19:54:48 UTC
(In reply to Zamundaaa from comment #38)
> Could you check if the monitor still turns off on login now? I added a
> commit that might resolve that

Perfect, it works flawlessly now. There's a very nice transition from sddm to black to a fade-in of the splash screen, it is the same on all monitors!

Thank you very much for your efforts
Comment 40 Zamundaaa 2021-09-08 00:45:17 UTC
Git commit b38bb416982babdae9941d41fa5b34717e5cae97 by Xaver Hugl.
Committed on 08/09/2021 at 00:44.
Pushed by zamundaaa into branch 'master'.

Test DrmPipelines for outputs

Not all combinations of connectors, crtcs and planes will work
on all hardware, so we need to test the pipelines before using
them.
Related: bug 435265

M  +170  -162  src/plugins/platforms/drm/drm_gpu.cpp
M  +5    -7    src/plugins/platforms/drm/drm_gpu.h
M  +0    -1    src/plugins/platforms/drm/drm_object_connector.h
M  +13   -0    src/plugins/platforms/drm/drm_output.cpp
M  +4    -0    src/plugins/platforms/drm/drm_output.h
M  +81   -78   src/plugins/platforms/drm/drm_pipeline.cpp
M  +15   -4    src/plugins/platforms/drm/drm_pipeline.h

https://invent.kde.org/plasma/kwin/commit/b38bb416982babdae9941d41fa5b34717e5cae97
Comment 41 Andrew Nowa Ammerlaan 2021-09-13 18:06:05 UTC
Created attachment 141519 [details]
wayland-session-log-kwin-5.22.5

After upgrading to version 5.22.5 this is sadly still not quite working (though it was working earlier when I tested the MR). The monitor connected to the AMDGPU works and one of the monitors on the iGPU (the right one). But the others still continue to show the SDDM screen after logging in. Wayland log attached. 

I'm seeing a bunch of these:
kwin_core: Provided presentation timestamp is invalid: 154666 (current: 154675)
kwin_wayland_drm: Atomic request failed to commit:  Invalid argument
kwin_wayland_drm: Atomic test commit failed. Aborting present.

Furthermore, the issue where the monitor on the AMDGPU turns black for about a second or two after logging in is back :(
Comment 42 Zamundaaa 2021-09-13 20:05:24 UTC
The commit is only in git master / upcoming 5.23
Comment 43 Andrew Nowa Ammerlaan 2021-09-14 06:07:51 UTC
(In reply to Zamundaaa from comment #42)
> The commit is only in git master / upcoming 5.23

Alrighty, I see, I'll wait a bit more then and test again later :D
Comment 44 Andrew Nowa Ammerlaan 2022-06-16 09:02:06 UTC
In version 5.25 this is broken again. After login the monitor connected to the AMDGPU turns itself off, and the monitor connected to the iGPU continues to display the SDDM login screen. If I switch to a tty and back I get a black screen on all monitors but the mouse is now visible.