Summary: | Segmentation faults and invalid reads/writes in powerdevil when logging out of Plasma 5.15.5 on Wayland in Fedora 30 | ||
---|---|---|---|
Product: | [Frameworks and Libraries] kwayland | Reporter: | Matt Fagnani <matt.fagnani> |
Component: | general | Assignee: | Martin Flöser <mgraesslin> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | rdieter |
Priority: | NOR | ||
Version: | 5.59.0 | ||
Target Milestone: | --- | ||
Platform: | Fedora RPMs | ||
OS: | Linux | ||
URL: | https://bugzilla.redhat.com/show_bug.cgi?id=1713467 https://bugzilla.redhat.com/show_bug.cgi?id=1727470 | ||
Latest Commit: | https://phabricator.kde.org/D27538 | Version Fixed In: | 5.68 |
Sentry Crash Report: | |||
Attachments: |
valgrind --log-file=valgrind-powerdevil-3.txt /usr/libexec/org_kde_powerdevil & output with invalid reads/writes after logging out of Plasma on Wayland
gdb output with full trace of all threads from segmentation fault of org_kde_powerdevil when logging out of Plasma on Wayland coredumpctl gdb output of segmentation fault in powerdevil when logging of Plasma on Wayland |
Description
Matt Fagnani
2019-06-10 21:15:33 UTC
Created attachment 120766 [details]
gdb output with full trace of all threads from segmentation fault of org_kde_powerdevil when logging out of Plasma on Wayland
Crash in comment #1 is because of QtWaylandClient::QWaylandDisplay::exitWithError(). The description also says that "The Wayland connection broke. Did the Wayland compositor die?" On logout, KWin of course "dies", but I have no idea how the Wayland protocol can inform clients that the compositor is no longer present, and how they could react. Reassigning to KWin developers for inspection. Created attachment 121409 [details] coredumpctl gdb output of segmentation fault in powerdevil when logging of Plasma on Wayland Thanks Christoph. I think that if the segmentation faults in powerdevil were fixed then the aborts of drkonqi and the restarted powerdevil after the Wayland compositor connection was broken wouldn't happen. I saw another segmentation fault in powerdevil when I logged out of Plasma 5.15.5 on Wayland. sddm didn't show up and the screen stayed blank which I've seen many times before when logging out of Plasma on Wayland. I pressed sysrq+alt+e , sysrq+alt+i which terminated then killed most of the userspace processes. sddm restarted after that. This segmentation fault occurred at about the same time that the screen went blank. coredumpctl gdb showed that tc_victim->fd in _int_malloc at malloc.c:3623 was an inaccessible address. Core was generated by `/usr/libexec/org_kde_powerdevil'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f0d44dcadac in _int_malloc (av=av@entry=0x7f0d2c000020, bytes=bytes@entry=65) at malloc.c:3622 3622 if (SINGLE_THREAD_P) [Current thread is 1 (Thread 0x7f0d33b86700 (LWP 1559))] (gdb) list 3617 3618 /* While bin not empty and tcache not full, copy chunks. */ 3619 while (tcache->counts[tc_idx] < mp_.tcache_count 3620 && (tc_victim = *fb) != NULL) 3621 { 3622 if (SINGLE_THREAD_P) 3623 *fb = tc_victim->fd; 3624 else 3625 { 3626 REMOVE_FB (fb, pp, tc_victim); (gdb) p tc_victim->fd Cannot access memory at address 0xa10000556b (gdb) p tc_victim $2 = (mchunkptr) 0xa10000555b A signal indicating a crash appeared after #13 in tcache_get at malloc.c:2952. KCrash::defaultCrashHandler in #11 showed errors like "Cannot access memory at address 0x7" which might indicate memory corruption. Qt string conversions involving "org.kde.kglobalaccel" happened at #16-19. I've seen many aborts of kglobalaccel5 when logging out of Plasma on Wayland and X as reported at https://bugzilla.redhat.com/show_bug.cgi?id=1701485 I've attached the coredumpctl gdb output of the crash with the full backtrace of all threads etc. I reported this crash in more detail at https://bugzilla.redhat.com/show_bug.cgi?id=1727470 Should I create a new report on bugs.kde.org since the trace is different? The versions used in the crash I reported in comment 3 were glib2-0:2.60.4-1.fc30.x86_64 glibc-0:2.29-15.fc30.x86_64 kf5-kwayland-0:5.59.0-2.fc30.x86_64 kwayland-integration-0:5.15.5-1.fc30.x86_64 kwin-wayland-0:5.15.5-2.fc30.x86_64 libwayland-client-0:1.17.0-1.fc30.x86_64 powerdevil-0:5.15.5-1.fc30.x86_64 qt5-qtbase-0:5.12.4-1.fc30.x86_64 qt5-qtwayland-5.12.4-2.fc30.x86_64 coredumpctl has 27 entries for aborts of drkonqi due to powerdevil segmentation faults and of the restarted powerdevil each. The segmentation faults of powerdevil often occurred about the same time as blank screens occurred which I reported at https://bugzilla.redhat.com/show_bug.cgi?id=1727482 The black screen problem seems to have been the one reported at https://bugs.kde.org/show_bug.cgi?id=372789 A patch to fix this issue for kwayland-integration was written by David Edmundson for Plasma 5.16.3 https://cgit.kde.org/kwayland-integration.git/commit/?id=bfce3c6727cdc58a2b8ba33c933df05e21914876 https://bugs.kde.org/show_bug.cgi?id=372789#c46 I've noticed similarities in the first invalid read at wl_proxy_unref (wayland-client.c:229) I reported and invalid reads starting at wayland-client.c:229 in in plasmashell https://bugs.kde.org/show_bug.cgi?id=409021#c1 konsole https://bugs.kde.org/show_bug.cgi?id=408971 kglobalaccel5 and akonadi_sendlater_agent The address freed had the following common functions and source lines and was 44 bytes inside a block of size 72 free'd ==4203== Address 0x1934ea3c is 44 bytes inside a block of size 72 free'd ==4203== at 0x4839A0C: free (vg_replace_malloc.c:540) ==4203== by 0x1949F844: destroy (wayland_pointer_p.h:63) ==4203== by 0x1949F844: KWayland::Client::Registry::Private::globalSync(void*, wl_callback*, unsigned int) (registry.cpp:539) ==4203== by 0x485CB27: ffi_call_unix64 (in /usr/lib64/libffi.so.6.0.2) ==4203== by 0x485C338: ffi_call (in /usr/lib64/libffi.so.6.0.2) ==4203== by 0x172C3606: wl_closure_invoke (connection.c:1014) ==4203== by 0x172BFF17: dispatch_event.isra.0 (wayland-client.c:1430) ==4203== by 0x172C146B: dispatch_queue (wayland-client.c:1576) ==4203== by 0x172C146B: wl_display_dispatch_queue_pending (wayland-client.c:1818) ==4203== by 0x172C18AA: wl_display_roundtrip_queue (wayland-client.c:1241) ==4203== by 0x194887C3: KWayland::Client::ConnectionThread::roundtrip() (connection_thread.cpp:290) Functions in those stacks might have freed the pointer before the other programs used it. KWayland::Client::Registry::Private::globalSync (registry.cpp:539) might be where the freeing was done too early. (gdb) list registry.cpp:533,540 533 void Registry::Private::globalSync(void* data, wl_callback* callback, uint32_t serial) 534 { 535 Q_UNUSED(serial) 536 auto r = reinterpret_cast<Registry::Private*>(data); 537 Q_ASSERT(r->callback == callback); 538 r->handleGlobalSync(); 539 r->callback.destroy(); 540 } Memory corruption due to the use-after-free errors might have led to the segmentation faults I saw. I'm reassigning this to frameworks-kwayland based on the above. kwayland-integration or libwayland-client are other possible packages involved. #6 QMessageLogger::fatal (this=this@entry=0x7fffd70c5ba0, msg=msg@entry=0x7ff994ac00b8 "The Wayland connection broke. Did the Wayland compositor die?") at global/qlogging.cpp:893 This means that the compositor crashed. Due to a Qt issue, when this happens, the app using it will crash too. KDE developers submitted a fix, but sadly it was not merged. See https://codereview.qt-project.org/c/qt/qtwayland/+/308984. Until we get better handling of this in Qt, the best we can do is debug why the compositor crashed in the first place. So can you please get a backtrace of the crash in kwin_wayland and then file a new bug report with it on kwin | wayland-generic? Thanks! You may be able to use the `coredumpctl` utility to retrieve the backtrace. See https://community.kde.org/Guidelines_and_HOWTOs/Debugging/How_to_create_useful_crash_reports#Retrieving_a_backtrace_using_coredumpctl (In reply to Nate Graham from comment #6) > #6 QMessageLogger::fatal (this=this@entry=0x7fffd70c5ba0, > msg=msg@entry=0x7ff994ac00b8 "The Wayland connection broke. Did the Wayland > compositor die?") at global/qlogging.cpp:893 > > This means that the compositor crashed. Due to a Qt issue, when this > happens, the app using it will crash too. KDE developers submitted a fix, > but sadly it was not merged. See > https://codereview.qt-project.org/c/qt/qtwayland/+/308984. > > Until we get better handling of this in Qt, the best we can do is debug why > the compositor crashed in the first place. So can you please get a backtrace > of the crash in kwin_wayland and then file a new bug report with it on kwin > | wayland-generic? Thanks! > > You may be able to use the `coredumpctl` utility to retrieve the backtrace. > See > https://community.kde.org/Guidelines_and_HOWTOs/Debugging/ > How_to_create_useful_crash_reports#Retrieving_a_backtrace_using_coredumpctl Nate, I think that kwin_wayland stopped normally during logout before powerdevil segmentation faulted and then powerdevil tried to restart and drkonqi aborted, which led to the errors like The Wayland connection broke. Did the Wayland compositor die? I didn't mention any kwin_wayland crashes in my report. The first powerdevil segmentation faults were due to the use-after-free errors in wl_proxy_unref (wayland-client.c:229) in libwayland-client. I think those errors were fixed by Daniel Vrátil in kwayland 5.68 whose message mentioned invalid read/write use-after-free errors in wl_proxy_unref (wayland-client.c:230) also involving KWayland::Client::Registry::Private::globalSync in the commit https://phabricator.kde.org/R127:4ceb35672dfa3378776a926c452b9f83ffe2bc41 Registry: don't destroy the callback on globalsync Summary: Instead just unref it, because the wl_display_dispatch_queue_pending will try to destroy the callback afterwards as well, leading to invalid read/write. Fixes Valgrind warnings when running KScreen tests: ==460922== Invalid read of size 4 ==460922== at 0x5CE5B34: wl_proxy_unref (wayland-client.c:230) ==460922== by 0x5CE5C33: destroy_queued_closure (wayland-client.c:292) ==460922== by 0x5CE74AB: dispatch_queue (wayland-client.c:1591) ==460922== by 0x5CE74AB: wl_display_dispatch_queue_pending (wayland-client.c:1833) ==460922== by 0x4E0240D: KWayland::Client::EventQueue::dispatch() (src/frameworks/kwayland/src/client/event_queue.cpp:96) g==460922== Address 0x17233aac is 44 bytes inside a block of size 80 free'd ==460922== at 0x483B9F5: free (vg_replace_malloc.c:540) ==460922== by 0x4E15B60: destroy (src/frameworks/kwayland/src/client/wayland_pointer_p.h:63) ==460922== by 0x4E15B60: KWayland::Client::Registry::Private::globalSync(void*, wl_callback*, unsigned int) (src/frameworks/kwayland/src/client/registry.cpp:548) ... ==460922== by 0x5CE74AB: dispatch_queue (wayland-client.c:1591) ==460922== by 0x5CE74AB: wl_display_dispatch_queue_pending (wayland-client.c:1833) ==460922== by 0x4E0240D: KWayland::Client::EventQueue::dispatch() (src/frameworks/kwayland/src/client/event_queue.cpp:96) I haven't seen these powerdevil crashes or those involving similar invalid read/write errors in plasmashell, konsole, etc I mentioned in comment 5 since KF 5.68.0. The qtwayland fix you mentioned could resolve the aborts of KDE programs after kwin_wayland stopped when logging out. Alternatively, kwin_wayland could be made to wait until the other KDE programs have stopped before it is stopped maybe using the systemd integration. Thanks. |