SUMMARY I maintain Fedora's instance of an automated test system called openQA: https://openqa.fedoraproject.org . openQA allows for "human-like" testing of graphical applications by matching areas of screenshots and performing mouse and keyboard input operations via a VNC connection. We run a suite of openQA tests on Fedora composes and updates. Since KDE/Plasma 5.23.90 landed in Fedora Rawhide, the test that runs a Fedora installation from the KDE live image is failing. The test boots the live image, double-clicks an icon on the desktop to launch the installer - which runs maximized and without a window title, see attached screenshot - then starts trying to navigate through it with keyboard and mouse inputs. For example, the first page of the installer is a language selection screen (as shown in the screenshot). The test first checks that it's at this screen and clicks on a non-active area of it (the word "FEDORA" in "WELCOME TO FEDORA 36") to ensure the window is active. Then it clicks in the text entry box at bottom left and types the word "english". Then it looks for "English (United States)" on the right-hand side and clicks on that. Then it looks for the button labelled "Continue" and clicks on that. Ordinarily this would indeed continue the installation process, and that's what happens on the same test run on the GNOME live image, and on the same test run on our dedicated installer images (which use gnome-kiosk), and what used to happen when running the test on KDE. But with Plasma 5.23.90, when openQA clicks on "Continue", nothing happens. We've established a few things by experimentation. It seems that once we reach the broken state, clicking on anything in the installer no longer works. For instance, I tried modifying the test to click on a different language, which should lead to that language being selected - but it doesn't. It seems the first one or two clicks in the installer window work OK, but after that, they stop working. However, clicking on something outside the installer window - e.g. the KDE "kicker" icon - works fine. We also found that if we hijack the VNC connection the openQA test runner uses to inject input events, we can click on things in the installer successfully by hand. The bug for some reason only affects the events as input by openQA's test runner. Here are some more details about exactly how this is implemented in os-autoinst (the openQA test runner). The test is run on a qemu VM, with the built-in VNC server enabled. os-autoinst sends mouse and keyboard events via this VNC server. This is precisely what happens when it wants to "click on something": * Match the relevant area and determine its centre point * Set the cursor to that precise position; it does not "move" the cursor the way a human would, it just zaps it there with a single VNC event (see https://github.com/os-autoinst/os-autoinst/blob/master/consoles/VNC.pm#L745-L760) * Send a 'mouse button down' event * Wait 0.15 seconds * Send a 'mouse button up' event * Wait 1 second * Set the cursor to 1023,767: this is called a "mouse hide" event, the purpose is to position the cursor somewhere we hope it won't happen to interfere with a subsequent match attempt (i.e. the bottom right corner of the screen) Notably, this means that between clicks, the cursor is instantly repositioned to 1023x767 and then instantly positioned to the click target location. This is the most obvious part of the runner's behaviour that a human cannot replicate - when we take over the VNC connection and interact with the VM manually, obviously, we can't instantly zap the cursor wherever we want, we're just moving it there with our human hands and human mice. So this repositioning behaviour may be significant (it was in a similar bug we found in GNOME recently). STEPS TO REPRODUCE This is trivial to reproduce if you have an instance of openQA deployed with Fedora's test configuration, and probably close to impossible to reproduce if you don't, unfortunately. At least you'd probably have to replicate the whole 'instant cursor repositioning' thing. I can reproduce it on demand and provide whatever debugging information would be useful. I'm available on libera.chat IRC and chat.fedoraproject.org (they're bridged) as adamw. OBSERVED RESULT Mouse clicks stop working shortly after installer runs. EXPECTED RESULT Mouse clicks should keep working. SOFTWARE/OS VERSIONS Linux/KDE Plasma: Fedora Rawhide (available in About System) KDE Plasma Version: 5.23.90 KDE Frameworks Version: 5.23.90 Qt Version: 5.15.2 ADDITIONAL INFORMATION
Created attachment 146003 [details] how the screen we're trying to interact with looks
I tested a build of current git 5.24 head, same bug. I'll try and bisect back between 5.23.5 and 5.23.90, though it will take a bit of time to do.
Bisecting is turning out to be hard because older commits just don't build against current (everything else), like kwayland-server :( Can't build anything before f2b29e35 by the looks of it due to an incompatibility with `setSupportedFormatsAndModifiers`, and f2b29e35 doesn't build because of a "static assertion failed: Signal and slot arguments are not compatible." error I haven't totally traced out yet. But even just those two mean I'm missing like several months of history in the bisect attempt, so it's not likely to work out well.
Can you link to a failed openQA job? With Plasma 5.24 (and git master), something vaguely related happened in our testing as well. Until https://invent.kde.org/plasma/kwin/-/merge_requests/1946, kwin didn't use the cursor plane with virtio_gpu, so the cursor was visible to openQA as well. This uncovered that the mouse_hide call did not have any effect at all: https://openqa.opensuse.org/tests/2160675#step/start_wayland_plasma5/26 The cursor moved fine when connecting to VNC manually.
Sure, here's one of the most recent failures - https://openqa.fedoraproject.org/tests/1114561 . Logs and video at https://openqa.fedoraproject.org/tests/1114561#downloads . You should be able to see from the autoinst-log.txt log that the mouse_hide calls do seem to take effect - there are several moves to 1023, 767 which are mouse_hide events (either explicit ones or the ones that happen as part of assert_and_click ). I did run a test where I disabled the mouse_hide that happens at the end of assert_and_click , and the result of that is that the test gets one step further - the Continue button works, but it fails at the next step. The next thing that happens is a child window pops up with a warning about this being a pre-release, with a button you have to click to continue. It looks like this (from the Workstation live test): https://openqa.fedoraproject.org/tests/1114249#step/_boot_to_anaconda/12 clicking on the "I want to proceed" button there doesn't work. It does seem potentially interesting that that button is in a different window. I wonder if I could get past it somehow, would subsequent clicks back in the main installer window continue to work. Next week I'll see if we can get past it with keyboard inputs (I haven't actually checked that yet).
I just realized I didn't mention, so in case it's not obvious - the installer app in use here is built on GTK+.
I think I have another related failure: https://openqa.opensuse.org/tests/2167096 Dolphin thinks the cursor is hovering over the folder icon there. (In reply to Adam Williamson from comment #5) > Sure, here's one of the most recent failures - > https://openqa.fedoraproject.org/tests/1114561 . Logs and video at > https://openqa.fedoraproject.org/tests/1114561#downloads . You should be > able to see from the autoinst-log.txt log that the mouse_hide calls do seem > to take effect - there are several moves to 1023, 767 which are mouse_hide > events (either explicit ones or the ones that happen as part of > assert_and_click ). In the video the scrollbar of the right locale selection box is visible when the cursor is inside, so GTK is at least aware of the cursor position there. I don't see any hover effect on the continue button, but that might be normal. (In reply to Adam Williamson from comment #6) > I just realized I didn't mention, so in case it's not obvious - the > installer app in use here is built on GTK+. Using wayland? If you can reproduce the issue with WAYLAND_DEBUG=1, that might be helpful. I'll try it with dolphin as well.
Created attachment 146129 [details] WAYLAND_DEBUG=1 dolphin (In reply to Fabian Vogt from comment #7) > I think I have another related failure: > https://openqa.opensuse.org/tests/2167096 > Dolphin thinks the cursor is hovering over the folder icon there. > ... > I'll try it with dolphin as well. Journal with WAYLAND_DEBUG=1 output of dolphin in the failed openQA test https://openqa.opensuse.org/tests/2170688#step/dolphin/23 is attached. Only issue I can spot is that KWin sends wl_pointer@4.leave(1624, nil), even though the surface parameter does not have allow-null="true".
is it possible to run anaconda with the WAYLAND_DEBUG=1 environment variable?
Um. I think maybe not, because it seems when running as the live installer, we actually force anaconda to run with the X11 backend: https://github.com/rhinstaller/anaconda/blob/master/data/liveinst/liveinst#L134-L135 liveinst is a wrapper script which is what gets called to run anaconda on live images - it does various customizations specific to the live environment, like that one, before running anaconda itself.
Can you set the KWIN_XWAYLAND_DEBUG environment variable in that case and capture kwin's log output?
Yeah, I can give that a shot tomorrow. Thanks.
Created attachment 146234 [details] KWIN_XWAYLAND_DEBUG=1 messages from anaconda This is /var/log/messages from an openQA failure with KWIN_XWAYLAND_DEBUG=1 set via /etc/profile.d. The relevant messages should be around the 13:09:21 mark in the log - the messages from Anaconda about setting the locale to en_US.UTF-8 should come from the point where the test types 'english' into the entry box and clicks on "English (United States)", right before the point where it tries to click Continue and it doesn't work. The timestamps on the logs from the VM itself and the logs from the test runner don't always match precisely, but FWIW, here are the logs of the critical point from the test runner: [2022-02-03T18:09:27.871Z] [debug] tests/_boot_to_anaconda.pm:167 called testapi::assert_and_click [2022-02-03T18:09:27.871141Z] [debug] <<< testapi::assert_and_click(mustmatch="anaconda_select_install_lang_filtered", timeout=30) [2022-02-03T18:09:27.871998Z] [debug] clicking at 686/206 [2022-02-03T18:09:27.872119Z] [debug] tests/_boot_to_anaconda.pm:167 called testapi::assert_and_click [2022-02-03T18:09:27.872211Z] [debug] <<< testapi::mouse_set(x=686, y=206) [2022-02-03T18:09:27.872740Z] [debug] mouse_move 686, 206 [2022-02-03T18:09:27.872876Z] [debug] send_pointer_event 0, 686, 206, 1 [2022-02-03T18:09:27.873272Z] [debug] tests/_boot_to_anaconda.pm:167 called testapi::assert_and_click [2022-02-03T18:09:27.873381Z] [debug] <<< testapi::mouse_click(button="left", cursor_down="0.15") [2022-02-03T18:09:27.873867Z] [debug] pointer_event 1 686, 206 [2022-02-03T18:09:27.874004Z] [debug] send_pointer_event 1, 686, 206, 1 [2022-02-03T18:09:28.024935Z] [debug] pointer_event 0 686, 206 [2022-02-03T18:09:28.025082Z] [debug] send_pointer_event 0, 686, 206, 1 [2022-02-03T18:09:29.025684Z] [debug] tests/_boot_to_anaconda.pm:167 called testapi::assert_and_click [2022-02-03T18:09:29.025885Z] [debug] <<< testapi::mouse_set(x=1023, y=767) [2022-02-03T18:09:29.026785Z] [debug] mouse_move 1023, 767 [2022-02-03T18:09:29.026920Z] [debug] send_pointer_event 0, 1023, 767, 1 [2022-02-03T18:09:29.027420Z] [debug] tests/_boot_to_anaconda.pm:168 called testapi::assert_screen [2022-02-03T18:09:29.027602Z] [debug] <<< testapi::assert_screen(mustmatch="anaconda_select_install_lang_selected", timeout=10) [2022-02-03T18:09:29.038653Z] [debug] no match: 9.9s, best candidate: install_lang_english_selected-testonlycursor-20220128 (0.93) [2022-02-03T18:09:30.173707Z] [debug] >>> testapi::_handle_found_needle: found install_lang_english_selected-20200116, similarity 0.98 @ 605/197 [2022-02-03T18:09:30.173913Z] [debug] tests/_boot_to_anaconda.pm:175 called testapi::assert_and_click [2022-02-03T18:09:30.174063Z] [debug] <<< testapi::assert_screen(mustmatch="anaconda_select_install_lang_continue", timeout=30) [2022-02-03T18:09:31.196994Z] [debug] >>> testapi::_handle_found_needle: found install_lang_continue-20210714, similarity 0.99 @ 926/684 [2022-02-03T18:09:31.197187Z] [debug] tests/_boot_to_anaconda.pm:175 called testapi::assert_and_click [2022-02-03T18:09:31.197326Z] [debug] <<< testapi::assert_and_click(mustmatch="anaconda_select_install_lang_continue", timeout=30) [2022-02-03T18:09:31.198252Z] [debug] clicking at 970/696 [2022-02-03T18:09:31.198373Z] [debug] tests/_boot_to_anaconda.pm:175 called testapi::assert_and_click [2022-02-03T18:09:31.198478Z] [debug] <<< testapi::mouse_set(x=970, y=696) [2022-02-03T18:09:31.199030Z] [debug] mouse_move 970, 696 [2022-02-03T18:09:31.199168Z] [debug] send_pointer_event 0, 970, 696, 1 [2022-02-03T18:09:31.199546Z] [debug] tests/_boot_to_anaconda.pm:175 called testapi::assert_and_click [2022-02-03T18:09:31.199650Z] [debug] <<< testapi::mouse_click(button="left", cursor_down="0.15") [2022-02-03T18:09:31.200157Z] [debug] pointer_event 1 970, 696 [2022-02-03T18:09:31.200271Z] [debug] send_pointer_event 1, 970, 696, 1 [2022-02-03T18:09:31.351157Z] [debug] pointer_event 0 970, 696 [2022-02-03T18:09:31.351279Z] [debug] send_pointer_event 0, 970, 696, 1 [2022-02-03T18:09:32.351940Z] [debug] tests/_boot_to_anaconda.pm:175 called testapi::assert_and_click [2022-02-03T18:09:32.352249Z] [debug] <<< testapi::mouse_set(x=1023, y=767) [2022-02-03T18:09:32.354309Z] [debug] mouse_move 1023, 767 [2022-02-03T18:09:32.354443Z] [debug] send_pointer_event 0, 1023, 767, 1 That's clicking on "English (US)" (at 686/206), moving mouse to 1023/767, then clicking on the continue button (970/696), then moving mouse to 1023/767 again.
2022-02-03T13:09:29.767650-05:00 fedora kwin_wayland_wrapper[1554]: [ 152321.189] wl_pointer@22.enter(78, wl_surface@37, 970.00000000, 696.00000000) 2022-02-03T13:09:29.768280-05:00 fedora kwin_wayland_wrapper[1554]: [ 152322.090] -> wl_pointer@22.set_cursor(78, wl_surface@17, 4, 4) 2022-02-03T13:09:29.768728-05:00 fedora kwin_wayland_wrapper[1554]: [ 152322.589] -> wl_surface@17.attach(wl_buffer@47, 0, 0) 2022-02-03T13:09:29.769154-05:00 fedora kwin_wayland_wrapper[1554]: [ 152322.978] -> wl_surface@17.damage_buffer(0, 0, 32, 32) 2022-02-03T13:09:29.769432-05:00 fedora kwin_wayland_wrapper[1554]: [ 152323.393] -> wl_surface@17.frame(new id wl_callback@130) 2022-02-03T13:09:29.769649-05:00 fedora kwin_wayland_wrapper[1554]: [ 152323.701] -> wl_surface@17.commit() 2022-02-03T13:09:29.769839-05:00 fedora kwin_wayland_wrapper[1554]: [ 152323.892] wl_pointer@22.frame() 2022-02-03T13:09:29.770067-05:00 fedora kwin_wayland_wrapper[1554]: [ 152324.114] wl_pointer@22.frame() 2022-02-03T13:09:29.783520-05:00 fedora kwin_wayland_wrapper[1554]: [ 152337.239] wl_pointer@22.button(79, 88815, 272, 1) 2022-02-03T13:09:29.783802-05:00 fedora kwin_wayland_wrapper[1554]: [ 152337.843] wl_pointer@22.frame() 2022-02-03T13:09:29.854266-05:00 fedora kwin_wayland_wrapper[1554]: [ 152408.065] wl_pointer@22.button(80, 88901, 272, 0) 2022-02-03T13:09:29.854581-05:00 fedora kwin_wayland_wrapper[1554]: [ 152408.597] wl_pointer@22.frame() kwin seems to send relevant pointer events to xwayland
> So this repositioning behaviour may be significant (it was in a similar bug we found in GNOME recently). what was the bug?
https://gitlab.gnome.org/GNOME/mutter/-/issues/2117 , and the fix was https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/2257 . Definitely not the same bug as it's mutter and mutter isn't involved here, but it was an odd coincidence that they showed up around the same time.
Could you check whether the issue can be reproduced with the latest kwin 5.24 tars? kwin's tar was respun and it included a fix for a similar bug
As mentioned above I tried a git snapshot on 2022-01-28. If the fix landed after that, though, it's a possibility. I'll try in the morning.
Yes, it landed a couple of days ago
I just checked, and it looks like we've had 5.24.0 in Rawhide for a few days. Unfortunately the bug is still present :(
Created attachment 146454 [details] my debugging I cleaned up the attached logs and don't see anything suspicious in the logs, kwin sends button click and all button press events have matching release events what does happen if the test clicks the window the second time at 970x696 after some interval?
Now that Plasma 5.24.0 is exposed to more tests as part of Tumbleweed QA, this issue appears in two more places now: Clicking the connect button in hexchat (GTK2, so Xwayland) does nothing: https://openqa.opensuse.org/tests/2180598#step/hexchat/27 Clicking the play button in VLC (Qt!) does nothing: https://openqa.opensuse.org/tests/2180598#step/vlc/24
Oh well, that answers a question I was about to look into, which was "could this be to do with https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/4123 ?" - it sounds like the answer is 'no'. I'm double-checking the answer to Vlad's question to me right now, I'm pretty sure the answer is "clicking twice doesn't help at all" but just double checking that.
Yeah, confirmed, clicking twice in the same place does not help at all. Neither with nor without the "mouse hide" operation in between clicks.
I can reproduce the issue by sending just the simulated click to the QEMU VNC server of an openQA test with a python script. I could narrow it down to the following scenario: send_pointer_event(0, 1021, 766) time.sleep(1) send_pointer_event(0, 634, 584) Even though the cursor is still in the xev area, it got a leave event! EnterNotify event, serial 36, synthetic NO, window 0xc00001, root 0x51c, subw 0xc00002, time 8670092, (31,38), root:(634,584), mode NotifyNormal, detail NotifyVirtual, same_screen YES, focus YES, state 0 KeymapNotify event, serial 36, synthetic NO, window 0x0, keys: 28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 MotionNotify event, serial 36, synthetic NO, window 0xc00001, root 0x51c, subw 0xc00002, time 8670092, (31,38), root:(634,584), state 0x0, is_hint 0, same_screen YES LeaveNotify event, serial 36, synthetic NO, window 0xc00001, root 0x51c, subw 0xc00002, time 8670092, (418,220), root:(1021,766), mode NotifyNormal, detail NotifyVirtual, same_screen YES, focus YES, state 0 Feb 23 11:07:53 localhost.localdomain kwin_wayland_wrapper[5363]: [2868737.319] wl_pointer@22.leave(733, wl_surface@34) Feb 23 11:07:53 localhost.localdomain kwin_wayland_wrapper[5363]: [2868737.357] wl_pointer@22.frame() Feb 23 11:07:54 localhost.localdomain kwin_wayland_wrapper[5363]: [2869657.957] wl_pointer@22.enter(734, wl_surface@34, 31.00000000, 67.00000000) Feb 23 11:07:54 localhost.localdomain kwin_wayland_wrapper[5363]: [2869658.037] wl_pointer@22.frame() Feb 23 11:07:54 localhost.localdomain kwin_wayland_wrapper[5363]: [2869658.055] wl_pointer@22.frame() Feb 23 11:07:54 localhost.localdomain kwin_wayland_wrapper[5363]: [2869662.349] -> wl_pointer@22.set_cursor(734, wl_surface@17, 1, 1) Feb 23 11:07:54 localhost.localdomain kwin_wayland_wrapper[5363]: [2869662.392] -> wl_surface@17.attach(wl_buffer@27, 0, 0) Feb 23 11:07:54 localhost.localdomain kwin_wayland_wrapper[5363]: [2869662.419] -> wl_surface@17.damage_buffer(0, 0, 10, 16) Feb 23 11:07:54 localhost.localdomain kwin_wayland_wrapper[5363]: [2869662.451] -> wl_surface@17.frame(new id wl_callback@30) Feb 23 11:07:54 localhost.localdomain kwin_wayland_wrapper[5363]: [2869662.468] -> wl_surface@17.commit() Feb 23 11:07:54 localhost.localdomain kwin_wayland_wrapper[5363]: [2869667.220] wl_display@1.delete_id(30) Feb 23 11:07:54 localhost.localdomain kwin_wayland_wrapper[5363]: [2869667.274] wl_callback@30.done(8670095) Disabling DataBridge::create(this); did not help.
A possibly relevant merge request was started @ https://invent.kde.org/plasma/kwin/-/merge_requests/2064
Git commit aaa07f0605d605ae2472b1a8f7f4bd07f2138cb2 by Vlad Zahorodnii. Committed on 23/02/2022 at 20:03. Pushed by vladz into branch 'master'. Revert "Remove mysterious s_cursorUpdateBlocking boolean flag in pointer_input.cpp" This reverts commit 3d0bdc56a48fe0d5cc0889ab098f6c6c922d20c7. seat->setFocusedPointerSurface() before notifyPointerMotion() is needed to prevent sending a motion event that's outside the previously focused surface. M +11 -0 src/pointer_input.cpp https://invent.kde.org/plasma/kwin/commit/aaa07f0605d605ae2472b1a8f7f4bd07f2138cb2
Git commit bac4f4ed62017bb4aeb81078f5ec2cd915af5de4 by Vlad Zahorodnii. Committed on 23/02/2022 at 20:21. Pushed by vladz into branch 'Plasma/5.24'. Revert "Remove mysterious s_cursorUpdateBlocking boolean flag in pointer_input.cpp" This reverts commit 3d0bdc56a48fe0d5cc0889ab098f6c6c922d20c7. seat->setFocusedPointerSurface() before notifyPointerMotion() is needed to prevent sending a motion event that's outside the previously focused surface. (cherry picked from commit aaa07f0605d605ae2472b1a8f7f4bd07f2138cb2) M +11 -0 src/pointer_input.cpp https://invent.kde.org/plasma/kwin/commit/bac4f4ed62017bb4aeb81078f5ec2cd915af5de4