Bug 415872 - kwin_wayland random segfault libQt5Qml.so.5.14.0[7fe09a171000+307000]
Summary: kwin_wayland random segfault libQt5Qml.so.5.14.0[7fe09a171000+307000]
Status: RESOLVED WORKSFORME
Alias: None
Product: kwin
Classification: Plasma
Component: scripting (show other bugs)
Version: 5.17.4
Platform: Other Linux
: HI normal
Target Milestone: ---
Assignee: KWin default assignee
URL:
Keywords:
: 393871 419579 419714 421120 424762 (view as bug list)
Depends on:
Blocks:
 
Reported: 2020-01-04 14:46 UTC by Tom B
Modified: 2023-01-03 12:34 UTC (History)
10 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Reproduction Video (Under X11) (507.93 KB, video/mp4)
2020-04-27 08:19 UTC, karl
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tom B 2020-01-04 14:46:20 UTC
SUMMARY

kwin_wayland crashes intermittently and sends me back to the SDDM login screen. Generally around every 6-8 hours of use, sometimes sooner. 

It seems to only be a matter of time.

Usually when manipulating windows. Sometimes when switching virtual desktops.
Unfortunately I cannot seem to find something that triggers it consistently. 

STEPS TO REPRODUCE
1. Use kde on wayland
2. Interact with the desktop

OBSERVED RESULT

Jan 04 13:30:38 desktop audit[13920]: ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=9 pid=13920 comm="kwin_wayland" exe="/usr/bin/kwin_wayland" sig=11 res=1
Jan 04 13:30:41 desktop systemd-coredump[14565]: Process 13920 (kwin_wayland) of user 1000 dumped core.
Jan 04 13:30:52 desktop audit[14916]: ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=10 pid=14916 comm="kwin_wayland" exe="/usr/bin/kwin_wayland" sig=11 res=1
Jan 04 13:30:52 desktop kernel: kwin_wayland[14916]: segfault at 28 ip 00007fe09a17760c sp 00007ffc135bd040 error 4 in libQt5Qml.so.5.14.0[7fe09a171000+307000]
Jan 04 13:30:52 desktop kernel: audit: type=1701 audit(1578144652.676:500): auid=1000 uid=1000 gid=1000 ses=10 pid=14916 comm="kwin_wayland" exe="/usr/bin/kwin_wayland" sig=11 res=1
Jan 04 13:30:55 desktop systemd-coredump[15607]: Process 14916 (kwin_wayland) of user 1000 dumped core.
Jan 04 13:31:26 desktop audit[15949]: ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=11 pid=15949 comm="kwin_wayland" exe="/usr/bin/kwin_wayland" sig=11 res=1
Jan 04 13:31:26 desktop kernel: kwin_wayland[15949]: segfault at 28 ip 00007fdb376d760c sp 00007fffae5e2430 error 4 in libQt5Qml.so.5.14.0[7fdb376d1000+307000]
Jan 04 13:31:26 desktop kernel: audit: type=1701 audit(1578144686.282:568): auid=1000 uid=1000 gid=1000 ses=11 pid=15949 comm="kwin_wayland" exe="/usr/bin/kwin_wayland" sig=11 res=1
Jan 04 13:31:29 desktop systemd-coredump[16639]: Process 15949 (kwin_wayland) of user 1000 dumped core.
Jan 04 14:29:53 desktop audit[23599]: ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=16 pid=23599 comm="kwin_wayland" exe="/usr/bin/kwin_wayland" sig=11 res=1
Jan 04 14:29:53 desktop kernel: kwin_wayland[23599]: segfault at 28 ip 00007f95bfb4e60c sp 00007fff63a4d910 error 4 in libQt5Qml.so.5.14.0[7f95bfb48000+307000]
Jan 04 14:29:53 desktop kernel: audit: type=1701 audit(1578148193.434:849): auid=1000 uid=1000 gid=1000 ses=16 pid=23599 comm="kwin_wayland" exe="/usr/bin/kwin_wayland" sig=11 res=1
Jan 04 14:29:55 desktop systemd-coredump[78101]: Process 23599 (kwin_wayland) of user 1000 dumped core.


This is the segfault I get. It seems random but dragging/dropping Xorg windows seems to be the primary cause, the last culprit was firefox. 

EXPECTED RESULT

No crash

SOFTWARE/OS VERSIONS
Linux/KDE Plasma:  Arch Linux / Plasma 5.17.4
(available in About System)


Let me know if there's anything else I can do to debug this.


This might be related to #401350 but that's a different error.
Comment 1 Tom B 2020-01-05 16:35:00 UTC
I've worked out this is scripting engine related. It only appears to happen when the kwin tiling extension is enabled. I've not yet managed to track down exactly what causes the segfault but I now have reproducible steps to cause the segfault:

1. Enable the kwin tiling extension https://github.com/kwin-scripts/kwin-tiling
2. Apply the wayland fix: https://github.com/kwin-scripts/kwin-tiling/issues/117#issuecomment-541903299
3. Open two sublime_text windows
4. Drag one of the windows around the screen for a while

For me this causes a segfault within several seconds.
Comment 2 Tom B 2020-01-13 17:47:47 UTC
It seems to be something to do with how often the script triggers signals. 

If I rate limit the events (using the system clock to keep track of how often they are fired) to once per frame it works fine. See my workaround for the script here:

https://github.com/TRPB/kwin-tiling/commit/f69bdfc68989aeb6d15124a1106154c050067b4b

I have still had this crash randomly but not in a reproducible way and far less frequently. 

The issue is connecting events:

e.g.


     client.clientStepUserMovedResized.connect(function() {


When this does something to the client (or other windows) too frequently, more than once per frame as best as I can work out. Rate limit of once per 1ms = crash, 25ms = no crash. 1 frame being 16ms. i. This is likely a symptom not the underlying cause.

While I worked out what caused it in this instance, other events seem to cause the same problem when triggered too frequently.

It would be better if:

A) The 1 event per frame rate limit was imposed by the scripting engine
or
B) It allows more than 1 event/frame but doesn't segfault


I will point out that this is likely a workaround rather than a fix. Whatever causes the segfault is probably because two things happen out of the intended sequence. By rate limiting the events, it significantly reduces the chances of this happening. That said, after applying the rate limit to the script I could never trigger the crash when  dragging windows. 

I'll try rate limiting every event registered by the script to 25ms and see if I get any other random crashes. 

Rate limiting the calls to events registered in the `.connect` function could be a very simple fix in the scripting engine if it works.
Comment 3 Vlad Zahorodnii 2020-01-29 13:04:55 UTC
> Generally around every 6-8 hours of use, sometimes sooner.
It would be really helpful if you could retrieve the backtrace of the crash.
Comment 4 karl 2020-01-30 11:50:22 UTC
This one is possibly connected/the same as 416826, if it helps. I'm running X rather than Wayland. With Tom's patch to kwin-tiling my issue is reduced, but it still occurs, and I can pretty much reproduce it on demand.

Can I do anything that would help pinpoint if they're connected issues? I'm not a big C person these days and I'm a little stuck where to start.
Comment 5 Tom B 2020-02-07 17:13:59 UTC
I've been trying to replicate this in KDE Neon in Virtualbox so I could provide a .vdi where you can see the issue yourself. However, I have so far been unable to reproduce the issue.

I have noticed one difference. In Arch, Sublime Text is drawn with window decorations. In Neon Sublime text is given no decorations. I have no idea if this is related but I mention it because the reproducible crash I mentioned before happens when using the window decorations to drag the sublime text window around the screen. 

I have since switched to X.org on my arch box and the crash happens there too. It's not as severe as it only crashes the compositor not the whole session but it does happen, albeit less frequently.
Comment 6 Tom B 2020-02-20 16:47:58 UTC
Backtrace courtesy of David Strobach on the kwin tiling github issue:

https://github.com/kwin-scripts/kwin-tiling/issues/192

#7  0x00007f00b288297e in QV4::MemoryManager::collectRoots(QV4::MarkStack*) () at /usr/lib/libQt5Qml.so.5
#8  0x00007f00b2882b7e in QV4::MemoryManager::mark() () at /usr/lib/libQt5Qml.so.5
#9  0x00007f00b2884672 in  () at /usr/lib/libQt5Qml.so.5
#10 0x00007f00b2886aaa in QV4::MemoryManager::allocData(unsigned long) () at /usr/lib/libQt5Qml.so.5
#11 0x00007f00b298101a in QV4::QObjectMethod::create(QV4::ExecutionContext*, QObject*, int) () at /usr/lib/libQt5Qml.so.5
#12 0x00007f00b2983b87 in QV4::QObjectWrapper::getQmlProperty(QQmlContextData*, QV4::String*, QV4::QObjectWrapper::RevisionMode, bool*, bool) const () at /usr/lib/libQt5Qml.so.5
#13 0x00007f00b2983d4f in QV4::QObjectWrapper::virtualGet(QV4::Managed const*, QV4::PropertyKey, QV4::Value const*, bool*) () at /usr/lib/libQt5Qml.so.5
#14 0x00007f00b29b9a1d in QV4::Runtime::CallProperty::call(QV4::ExecutionEngine*, QV4::Value const&, int, QV4::Value*, int) () at /usr/lib/libQt5Qml.so.5
#15 0x00007f007ebe90d2 in  ()
#16 0x0000000000000000 in  ()
Comment 7 Tom B 2020-04-22 13:10:37 UTC
Is there anything else I can provide to help this become resolved? This is a very annoying issue that happens several times a day and on wayland halts the entire session.
Comment 8 David Edmundson 2020-04-22 13:36:10 UTC
>Is there anything else I can provide to help this become resolved? 

Super explicit steps to reliably reproduce.
Comment 9 Christoph Cullmann 2020-04-26 15:08:33 UTC
This crash in QV4 engine is prominent in all tools using it :/
Thought no idea how to reproduce.
Comment 10 Christoph Cullmann 2020-04-26 15:08:56 UTC
*** Bug 393871 has been marked as a duplicate of this bug. ***
Comment 11 Christoph Cullmann 2020-04-26 15:09:11 UTC
*** Bug 419579 has been marked as a duplicate of this bug. ***
Comment 12 Christoph Cullmann 2020-04-26 15:10:42 UTC
*** Bug 419714 has been marked as a duplicate of this bug. ***
Comment 13 Christoph Cullmann 2020-04-26 15:12:02 UTC
I collected some more of the JS crashs in collect :/

bug 419714 claims to be better reproducable.
Comment 14 karl 2020-04-26 16:04:02 UTC
(In reply to David Edmundson from comment #8)
> >Is there anything else I can provide to help this become resolved? 
> 
> Super explicit steps to reliably reproduce.

Assuming this is the same bug I’m hitting in 416826, the repro in that bug is enable KWin-tiling. Open 2+ windows. Use mouse to resize a window rapidly without letting go of the lmb. I find it reproduces most reliably when resizing vertically (I.e. drag bottom border up and down). 

I’ll see if I can add a short video clip tomorrow.
Comment 15 karl 2020-04-27 08:19:07 UTC
Created attachment 127913 [details]
Reproduction Video (Under X11)

As mentioned above, a simple, reproduction video, under X11. My wayland session is currently having issues and I've not yet had the time to investigate why, apologies.
Comment 16 Igor Kushnir 2020-05-07 19:31:51 UTC
*** Bug 421120 has been marked as a duplicate of this bug. ***
Comment 18 Christoph Cullmann 2020-05-23 13:07:32 UTC
This looks like

https://bugreports.qt.io/browse/QTBUG-84363

=> Qt 5.14 is actually not usable for us :/ I haven't tried if the fix there really solves the issues for the not yet release 5.15.1..
Comment 19 karl 2020-05-25 14:47:41 UTC
(In reply to Christoph Cullmann from comment #18)
> This looks like
> 
> https://bugreports.qt.io/browse/QTBUG-84363
> 
> => Qt 5.14 is actually not usable for us :/ I haven't tried if the fix there
> really solves the issues for the not yet release 5.15.1..

FWIW the workaround does seem to have resolved the crashing for me. I have not had the opportunity to pull in and try the fix, yet.
Comment 20 Milian Wolff 2020-07-30 08:47:47 UTC
*** Bug 424762 has been marked as a duplicate of this bug. ***
Comment 21 David Edmundson 2023-01-03 11:50:51 UTC
Crash is in Qt code and is quite old. Closing