Bug 479679

Summary: when built with clang/libc++, plasmashell exits with -1 on right mouse click with layershellqt: Cannot attach popup of unknown type
Product: [Plasma] plasmashell Reporter: Timur Mangliev <tigrmango>
Component: PanelAssignee: Plasma Bugs List <plasma-bugs>
Status: RESOLVED UPSTREAM    
Severity: normal CC: asturm, dougshaw77, egger.m, fabian, hugegameartgd, kde, me, nate, nekonexus, niccolo.venerandi, parona, sam, w.iron.zombie, xarblu
Priority: NOR Keywords: qt6
Version: 5.92.0   
Target Milestone: 1.0   
Platform: Gentoo Packages   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: plasmashell clang ubsan logs

Description Timur Mangliev 2024-01-12 05:07:11 UTC
SUMMARY
***
NOTE: If you are reporting a crash, please try to attach a backtrace with debug symbols.
See https://community.kde.org/Guidelines_and_HOWTOs/Debugging/How_to_create_useful_crash_reports
***


STEPS TO REPRODUCE
1. RMB on plasmashell panel anywhere

OBSERVED RESULT
exit with 255

EXPECTED RESULT
showing popup

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Gentoo 2.14/5.92.0
(available in About System)
KDE Plasma Version: 5.92.0
KDE Frameworks Version: 5.248.0
Qt Version: 6.6.1

ADDITIONAL INFORMATION
Comment 1 Doug 2024-01-12 05:37:10 UTC
I cannot reproduce on KDE Neon Unstable.  Is your panel's configuration default, or have you changed anything?
Comment 2 Timur Mangliev 2024-01-12 08:39:15 UTC
(In reply to Doug from comment #1)
> I cannot reproduce on KDE Neon Unstable.  Is your panel's configuration
> default, or have you changed anything?

Just tried removing ~/.config/plasmashellrc, still happens
Last three lines are, in order:
layershellqt: Cannot attach popup of unknown type
xdg_wm_base@3: error 3: no xdg_popup parent surface has been specified
The Wayland connection experienced a fatal error: Protocol error
Comment 3 Bernard 2024-01-13 13:31:13 UTC
I'm having this issue on gentoo too but with kde's master branch; has this really been fixed or is this due to gentoo's packaging?
Comment 4 Michael 2024-01-26 10:49:10 UTC
I'm having this exact issue on Gentoo.

Right clicking anywhere on the plasmashell panel or desktop results in this behavior. 

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: Gentoo 2.14/5.92.0
(available in About System)
KDE Plasma Version: 5.92.0
KDE Frameworks Version: 5.248.0
Qt Version: 6.6.1

Relevant logs:
Jan 26 11:45:16 gentoo3 kwin_wayland_wrapper[1419]: error in client communication (pid 3515)
Jan 26 11:45:16 gentoo3 plasmashell[3515]: xdg_wm_base@3: error 3: no xdg_popup parent surface has been specified
Jan 26 11:45:16 gentoo3 systemd[1276]: plasma-plasmashell.service: Main process exited, code=exited, status=255/EXCEPTION
Jan 26 11:45:16 gentoo3 systemd[1276]: plasma-plasmashell.service: Failed with result 'exit-code'.
Jan 26 11:45:16 gentoo3 systemd[1276]: plasma-plasmashell.service: Consumed 2.103s CPU time.
Jan 26 11:45:16 gentoo3 systemd[1276]: plasma-plasmashell.service: Scheduled restart job, restart counter is at 2.
Jan 26 11:45:16 gentoo3 systemd[1276]: Starting KDE Plasma Workspace...
Jan 26 11:45:16 gentoo3 systemd[1276]: Started KDE Plasma Workspace.


Is there a patch that I can apply to test it, since this is marked as fixed?
Comment 5 Bernard 2024-01-27 21:41:19 UTC
Changing the status since it is actually not fixed
Comment 6 Nate Graham 2024-01-29 17:01:19 UTC
Cannot reproduce either with built-from-source Plasma on top of Fedora 39. I strongly suspect this is a packaging issue. Maybe the layershell protocol definition is out of date? Either way please follow up with Gentoo folks about it. Thanks!
Comment 7 Andreas Sturmlechner 2024-01-29 20:12:38 UTC
No problem on my Gentoo Plasma 5.92 installation, so....?

(In reply to Nate Graham from comment #6)
> I strongly suspect this is a packaging issue. Maybe the layershell protocol
> definition is out of date?
We're shipping KF-5.248/Plasma-5.92 completely as-is, where do you suggest we start looking at?
Comment 8 Andreas Sturmlechner 2024-01-29 20:31:58 UTC
Timur, what fixed it for you back on 2024-01-12 09:39:15 CET?
Comment 9 Andreas Sturmlechner 2024-01-29 22:07:47 UTC
(In reply to Bernard from comment #3)
> I'm having this issue on gentoo too but with kde's master branch;
...just finished upgrading to 6.0 stable branch and still can't reproduce that bug in Gentoo.
Comment 10 Jonas Rakebrandt 2024-01-29 22:38:59 UTC
(In reply to Bernard from comment #3)
> I'm having this issue on gentoo too but with kde's master branch

Same here. Plasmashell crashes on right click.

I'm on a clang(+libcxx) system. Might be interesting to hear what the others experiencing this issue are using.
Just to see if it might be a clang vs. gcc issue or if it happens with either toolchain.
Comment 11 Andreas Sturmlechner 2024-01-29 22:56:21 UTC
(In reply to Jonas Rakebrandt from comment #10)
> I'm on a clang(+libcxx) system. Might be interesting to hear what the others
> experiencing this issue are using.
From downstream report I can see not clang but: -O3 -flto=thin
Comment 12 Michael 2024-01-29 23:05:27 UTC
> From downstream report I can see not clang but: -O3 -flto=thin
I should have been more clear on this, I'm also on a clang(+libcxx) system.
Compiling frameworks, plasma and qt packages with gcc right now (w/o lto and -O3) to see if it makes a difference
Comment 13 Sam James 2024-01-30 10:01:04 UTC
Please try to be explicit about what makes your system interesting when using e.g. clang, -O3, and so on. Even if things should work in such a configuration, identifying what's _different_ is important.

Anyway, for runtime misbehaviour, try UBSAN and alternating compiler (so GCC if using Clang).
Comment 14 Sam James 2024-01-30 10:01:28 UTC
... also generally worth starting downstream for such things to not bother upstream until we've tried some basic stuff like that.
Comment 15 Bernard 2024-01-30 14:08:04 UTC
(In reply to Jonas Rakebrandt from comment #10)
> (In reply to Bernard from comment #3)
> > I'm having this issue on gentoo too but with kde's master branch
> 
> Same here. Plasmashell crashes on right click.
> 
> I'm on a clang(+libcxx) system. Might be interesting to hear what the others
> experiencing this issue are using.
> Just to see if it might be a clang vs. gcc issue or if it happens with
> either toolchain.

I was also using the clang compiler part of gentoo's glibc clang profile
Comment 16 Michael 2024-01-30 21:48:45 UTC
Created attachment 165367 [details]
plasmashell clang ubsan logs

I was not able to recompile it with gcc due to many unresolved symbols. However I compiled frameworks/plasma/gear with clang and -march=native -pipe -O2 -g -fsanitize=undefined and the issue still persists.

Attached is the UBSAN log for the whole wayland plasma session until the crash.
There are some undefined behavior warnings but only during startup. After right clicking to trigger the crash no new UBSAN logs are written.
Comment 17 Andreas Sturmlechner 2024-01-31 12:43:31 UTC
(In reply to Michael from comment #16)
> I was not able to recompile it with gcc due to many unresolved symbols.
Define "it"? Rebuilding Qt should go first in such cases, then Frameworks, then Plasma/Gear.(In reply to Bernard from comment #15)

> (In reply to Jonas Rakebrandt from comment #10)
> I was also using the clang compiler part of gentoo's glibc clang profile
Optimisations?
Comment 18 Bernard 2024-01-31 17:10:29 UTC
(In reply to Andreas Sturmlechner from comment #17)
> > I was also using the clang compiler part of gentoo's glibc clang profile
> Optimisations?

I used the "-march=native -O3 -pipe" flags, however as Michael said the issue persists with "-O2"
Comment 19 Michael 2024-02-03 20:15:32 UTC
(In reply to Andreas Sturmlechner from comment #17)
> (In reply to Michael from comment #16)
> > I was not able to recompile it with gcc due to many unresolved symbols.
> Define "it"? Rebuilding Qt should go first in such cases, then Frameworks,
> then Plasma/Gear.(In reply to Bernard from comment #15)
> 
> > (In reply to Jonas Rakebrandt from comment #10)
> > I was also using the clang compiler part of gentoo's glibc clang profile
> Optimisations?

I also tried to compile QT, frameworks, plasma, gear (in this order) and managed to succeed, but due to ABI incompatibility(?) between libstdc++ and libc++ in many other system components I ended up with a broken desktop environment after restarting.

This issue still persists in RC2 with sane defaults flags (-O2 / no lto) on a systemwide clang/libc++ system.
Comment 20 Michael 2024-02-08 17:20:47 UTC
I compiled qt / frameworks / plasma with "-O2 -ftrapping-math -fsemantic-interposition -ffp-contract=fast" to get a behavior as close as possible to gcc, but the same issue still persists.
Please let me know if you have further ideas to tackle this down.

Should this issue be reopened / moved? Both issues (upstream and downstream) are closed as resolved, but this still affects people with a clang/libcxx setup.
Comment 21 Andreas Sturmlechner 2024-02-12 10:06:54 UTC
This just came up again in #plasma with clang/libc++, I think we have gathered enough cases to establish this is somehow caused or made visible by use of that particular toolchain, not downstream packaging.
Comment 22 David Redondo 2024-02-12 13:46:53 UTC
The issue seems related to a mismatch in type_info across objects making the any_cast in layershell-qt fail
Comment 23 Fabian Vogt 2024-02-13 10:09:41 UTC
(In reply to David Redondo from comment #22)
> The issue seems related to a mismatch in type_info across objects making the
> any_cast in layershell-qt fail

libc++ uses pointer equivalence for typeinfo comparison by default.

(1.) struct xdg_popup is an incomplete type in layer-shell-qt. This triggers an LLVM bug: https://github.com/llvm/llvm-project/issues/36746. The result of that bug is that the typeinfo symbols (typeinfo + name) are local to the .so file and won't bind to other objects. Thus pointer equivalence checks fail.

If this gets fixed it'd probably work already. I don't see that being worked on though, so I suggest to build libc++ with -DLIBCXX_TYPEINFO_COMPARISON_IMPLEMENTATION=2, which mirrors libstdc++ behavior.

(More details for interested readers below)

There are also other ways this can fail, but don't happen here (yet):

2. layer-shell-qt is loaded with RTLD_LOCAL. That means typeinfo can't be shared between separately loaded plugins, just within (not runtime) linked libaries (i.e. what ldd shows). (ref. https://lists.llvm.org/pipermail/llvm-dev/2014-June/073469.html)
3. struct xdg_popup is also used by the executable itself. Unless the typeinfo is global by either building it with -rdynamic or linking it against a library which also exports the typeinfo, this can't be shared either. (ref. https://lists.llvm.org/pipermail/llvm-dev/2014-June/073487.html)

For those reasons, libstdc++ switched to using string comparison for typeinfo equality checks: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/libsupc%2B%2B/typeinfo;h=fcc3077d06091339175c7af5d3acd92fd5416acd;hb=HEAD#l51

1 is clearly a bug in clang. https://refspecs.linuxbase.org/cxxabi-1.83.html#rtti explains how typeinfo for pointers to incomplete types are supposed to work.

2 is technically intended behavior, just totally non obvious. 3 is similar, technically every executable which might call dlopen has to be built with -rdynamic.
Comment 24 Nicolas Fella 2024-03-08 11:36:21 UTC
*** Bug 482856 has been marked as a duplicate of this bug. ***
Comment 25 Marco Rebhan 2024-10-05 17:28:53 UTC
(In reply to Fabian Vogt from comment #23)
> I suggest to build libc++ with
> -DLIBCXX_TYPEINFO_COMPARISON_IMPLEMENTATION=2, which mirrors libstdc++
> behavior.

Can confirm this works. Note that you need to rebuild everything against the new libc++ (or at least relevant packages, which those are I don't know exactly, plasma-workspace and layer-shell-qt alone did not work for me so I just did every dependency of plasma-workspace).