Bug 390314 - kwin_wayland SIGSEGV in KWin::WaylandCursorTheme::loadTheme
Summary: kwin_wayland SIGSEGV in KWin::WaylandCursorTheme::loadTheme
Status: RESOLVED FIXED
Alias: None
Product: kwin
Classification: Plasma
Component: wayland-generic (show other bugs)
Version: 5.12.0
Platform: openSUSE Linux
: NOR normal
Target Milestone: ---
Assignee: KWin default assignee
URL: https://phabricator.kde.org/D10549
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-12 13:55 UTC by schrott3000
Modified: 2018-03-04 08:43 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In: 5.12.3
schrott3000: Wayland+
schrott3000: X11-
mgraesslin: ReviewRequest+


Attachments
Detailed gdb backtrace (41.32 KB, text/plain)
2018-02-12 13:55 UTC, schrott3000
Details
Working startupconfig (385 bytes, application/x-shellscript)
2018-02-13 08:06 UTC, schrott3000
Details
Not working startupconfig (384 bytes, application/x-shellscript)
2018-02-13 08:11 UTC, schrott3000
Details
Support information (5.59 KB, text/plain)
2018-02-13 08:17 UTC, schrott3000
Details
Patch proposal for the recursion problem (533 bytes, patch)
2018-02-14 09:07 UTC, schrott3000
Details

Note You need to log in before you can comment on or make changes to this bug.
Description schrott3000 2018-02-12 13:55:23 UTC
Created attachment 110557 [details]
Detailed gdb backtrace

Symptoms:
kwin_wayland session crashes when moving the cursor e.g. over another window.

Cause:
There is an endless recursion when calling KWin::CursorImage::loadThemeCursor.
I will describe the scheme in short for details see the attached backtrace:
step1: Kwin calls KWin::CursorImage::loadThemeCursor
step2: Which then calls KWin::WaylandCursorTheme::get which call KWin::WaylandCursorTheme::loadTheme
step3: KWin::WaylandCursorTheme::loadTheme emits the event themeChanged() (wayland_cursor_theme.cpp, line:70)
step4: This is handled by the lambda expession at pointer_input.cpp line 908, which calls KWin::CursorImage::loadThemeCursor in line 911 which starts the whole thing again (=> goto step1)

This cycle breaks in the moment when malloc fails to allocate and kwin crashes with a segmentation fault:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f35190c0919 in malloc () from /lib64/libc.so.6

#0  0x00007f35190c0919 in malloc () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007f351948d718 in operator new(unsigned long) () from /usr/lib64/libstdc++.so.6
No symbol table info available.
#2  0x00007f3519c63f0d in QObjectPrivate::connectImpl(QObject const*, int, QObject const*, void**, QtPrivate::QSlotObjectBase*, Qt::ConnectionType, int const*, QMetaObject const*) () from /usr/lib64/libQt5Core.so.5
No symbol table info available.
#3  0x00007f3519c64295 in QObject::connectImpl(QObject const*, void**, QObject const*, void**, QtPrivate::QSlotObjectBase*, Qt::ConnectionType, int const*, QMetaObject const*) () from /usr/lib64/libQt5Core.so.5
No symbol table info available.
#4  0x00007f351bfcbded in QObject::connect<void (KWin::Cursor::*)(), void (KWin::WaylandCursorTheme::*)()> (type=Qt::AutoConnection, slot=(void (KWin::WaylandCursorTheme::*)(KWin::WaylandCursorTheme * const)) 0x7f351bfcbc70 <KWin::WaylandCursorTheme::loadTheme()>, receiver=0x563d373dfee0, signal=(void (KWin::Cursor::*)(KWin::Cursor * const)) 0x7f351c015600 <KWin::Cursor::themeChanged()>, sender=0x563d36a12350) at /usr/include/qt5/QtCore/qobject.h:259
        types = 0x0
#5  KWin::WaylandCursorTheme::loadTheme (this=0x563d373dfee0) at /usr/src/debug/kwin5-5.12.0-1.1.x86_64/wayland_cursor_theme.cpp:55
        size = <optimized out>
        this = 0x563d373dfee0
#6  0x00007f351bfcc0d0 in KWin::WaylandCursorTheme::get (this=this@entry=0x563d373dfee0, name=...) at /usr/src/debug/kwin5-5.12.0-1.1.x86_64/wayland_cursor_theme.cpp:90
        c = <optimized out>
#7  0x00007f351bfcc13d in KWin::WaylandCursorTheme::get (this=0x563d373dfee0, shape=<optimized out>) at /usr/src/debug/kwin5-5.12.0-1.1.x86_64/wayland_cursor_theme.cpp:84
No locals.
#8  0x00007f351bedbd7c in KWin::CursorImage::loadThemeCursor<Qt::CursorShape> (this=0x563d37581b20, shape=shape@entry=@0x7ffd4e791354: Qt::ArrowCursor, cursors=..., image=0x563d37581bd0) at /usr/src/debug/kwin5-5.12.0-1.1.x86_64/pointer_input.cpp:1194
        cursor = <optimized out>
        b = <optimized out>
        buffer = <optimized out>
        it = {i = 0x7f3519cdc360 <QHashData::shared_null>}
#9  0x00007f351beda40d in KWin::CursorImage::loadThemeCursor (image=<optimized out>, shape=<optimized out>, this=<optimized out>) at /usr/src/debug/kwin5-5.12.0-1.1.x86_64/pointer_input.cpp:1175
No locals.
#10 KWin::CursorImage::<lambda()>::operator() (__closure=0x563d374e45c0) at /usr/src/debug/kwin5-5.12.0-1.1.x86_64/pointer_input.cpp:911
        this = 0x563d37581b20
#11 QtPrivate::FunctorCall<QtPrivate::IndexesList<>, QtPrivate::List<>, void, KWin::CursorImage::CursorImage(KWin::PointerInputRedirection*)::<lambda()> >::call (arg=<optimized out>, f=...) at /usr/include/qt5/QtCore/qobjectdefs_impl.h:130
No locals.
#12 QtPrivate::Functor<KWin::CursorImage::CursorImage(KWin::PointerInputRedirection*)::<lambda()>, 0>::call<QtPrivate::List<>, void> (arg=<optimized out>, f=...) at /usr/include/qt5/QtCore/qobjectdefs_impl.h:240
No locals.
#13 QtPrivate::QFunctorSlotObject<KWin::CursorImage::CursorImage(KWin::PointerInputRedirection*)::<lambda()>, 0, QtPrivate::List<>, void>::impl(int, QtPrivate::QSlotObjectBase *, QObject *, void **, bool *) (which=<optimized out>, this_=0x563d374e45b0, r=<optimized out>, a=<optimized out>, ret=<optimized out>) at /usr/include/qt5/QtCore/qobjectdefs_impl.h:423
No locals.
#14 0x00007f3519c600cc in QMetaObject::activate(QObject*, int, int, void**) () from /usr/lib64/libQt5Core.so.5
No symbol table info available.
#15 0x00007f351bfcbd5f in KWin::WaylandCursorTheme::loadTheme (this=0x563d373dfee0) at /usr/src/debug/kwin5-5.12.0-1.1.x86_64/wayland_cursor_theme.cpp:70
        size = -2147483648
        this = 0x563d373dfee0
---- stripped here see attached log for detailed backtrace -----


Workaround:
Setting kcminputrc_mouse_cursorsize='' in startupconfig prevents triggering this bug (NOTE: it was originally set to 0 in my account). 

Please note that the attached backtrace does not show the full backtrace, hte parts in the middle are just the same over and over again.
Comment 1 Martin Flöser 2018-02-12 15:56:41 UTC
Do you know why it was 0? Did you modify that manually?
Comment 2 schrott3000 2018-02-12 17:49:28 UTC
I did not change it manually. But I can reproduce this by changing the cursor theme in kcmshell5, on relogin it sets it to 0.

Some background info:
I have tried kwin_wayland to see how stable it got. And I always got crashes with my normal account. Because I've heard that it should be pretty table by now I was wondering and checked with a newly created user account where everything  worked fine. This is why I started to investigate. I've attached the debugger and found that it's an issue with the kwin_wayland cursor theme implementation. So I've checked for differences in the relevant config file between my main account and this newly created account and this is how I came up with this workaround for it.
Comment 3 Martin Flöser 2018-02-12 20:23:27 UTC
Thanks for the investigation. The cursor theme kcm changed recently I guess the value stored changed as well. 0 doesn't make much sense. Should be easy to guard KWin against it.
Comment 4 Martin Flöser 2018-02-12 20:46:09 UTC
now that I have the code open: 0 is a valid value.
Comment 5 Martin Flöser 2018-02-12 20:53:32 UTC
To understand what's going on I need some more information:
* what's your cursor theme?
* can you please add the output of:
qdbus org.kde.KWin /KWin supportInformation

What I need to understand is what the code does in your system and why your system goes into that bad path.
Comment 6 schrott3000 2018-02-13 07:59:31 UTC
Yeah, 0 should be display dependent cursor size that would match the setting in kcmshell were it shows "Auflösungsabhängig" and the calculation in wayland_cursor_theme.cpp.

The cursor theme is 'Breeze' but it's not dependent on that it behaves the same way if it is set to 'DMZ'.

I will attach the support info with the working settings now and then try to get one with the not working settings. An additional hint here: I have noticed that it does not fail immediately, but when the cursor theme needs to be changed the first time due to e.g. moving it over a window bar.
Comment 7 schrott3000 2018-02-13 08:06:08 UTC
Created attachment 110599 [details]
Working startupconfig
Comment 8 schrott3000 2018-02-13 08:11:49 UTC
Created attachment 110600 [details]
Not working startupconfig
Comment 9 schrott3000 2018-02-13 08:17:34 UTC
Created attachment 110601 [details]
Support information

Please note that the support information output is identical for the working and the non working configuration.
Comment 10 schrott3000 2018-02-13 09:03:08 UTC
I have debugged a little bit further:
In normal operation the loop does not occur because after the first loop  m_theme in wayland_cursor_theme.cpp is loaded and the conditions cause that a different path is taken in the next loop through this functions. 
If size==0 something else happens:
An automatic size gets calculated with in my case is a negative number. This is passed to wl_cursor_theme_load with fails and return NULL. So m_theme remains set to NULL and it tries to load this theme with the wrong size in an infinite loop.
Comment 11 schrott3000 2018-02-13 09:42:59 UTC
Okay, I have taken a look at the size calculation at assembler level and it does some correct but likely not expected things:
At first for my pc:
=> output->pixelSize().height() returns 1080 (which is my screen height)
=> output->physicalSize().height() returns 0 (which is obviously a problem)

What then happens is that the following is calculated for floating points in SIMD:
qreal(1080) / (qreal(0) * 0.0393701) * 16.0 / 72.0
= qreal(1080) / 0.0 * 16.0 / 72.0 (Please not that div by zero is NaN in floating point and does not raise an exception)
= NaN * 16.0 / 72.0
= NaN

This NaN is than converted to int size in an implicit manner which results in:
int size = -2147483648 (which actually is the NaN value 0x80000000 as int)

This than causes wl_cursor_theme_load to fail and return NULL as it cannot cope with negative values.
Comment 12 schrott3000 2018-02-13 09:57:36 UTC
By the way I consider this recursive looping a very strange way of programming this thing. Is it understood in this code that emit themeChanged(); in WaylandCursorTheme::loadTheme() does not put the event in the Qt event loop but instead will call the connected handlers (like pointer_input.cpp, line 909) directly as normal functions as we don't have a queued signal connection? (referring to https://doc.qt.io/qt-5.10/signalsandslots.html#signals)
Comment 13 Martin Flöser 2018-02-13 16:06:19 UTC
thanks for the update.

From what I understand we have two problems here:
* we don't handle physical size being 0
* we emit the signal even if null is returned and it actually doesn't change

Which brings us to the recursion: there should not be a recursion. It only is a recursion because the theme is null even after calling the method.
Comment 14 schrott3000 2018-02-13 16:23:23 UTC
Yeah that's correct. If m_them gets correctly set there will be no recursion because the get function in wayland_cursor_theme.cpp would not call loadTheme again.
Writing a fix for this two problems should do it.
Comment 15 schrott3000 2018-02-14 09:07:23 UTC
Created attachment 110648 [details]
Patch proposal for the recursion problem

I have tested what will happen if the recursion is solved by not emitting the signal if wl_cursor_theme_load returns null. The result is that the session does not crash anymore but the cursor will disappear when loadTheme fails (which should be fixed if the physical monitor == 0 situation is fixed).
In my opinion this is an acceptable behaviour because if we don't pass a negative size to it wl_cursor_theme_load should only fail if it can't allocate the required memory (which should actually never happen). But I will leave it on you to decide that.
Comment 16 Martin Flöser 2018-02-14 15:54:52 UTC
Yeah, that's what I had in mind as well.
Comment 17 Martin Flöser 2018-02-15 17:09:06 UTC
Full solution for both problems including a unit test exposing the problem at https://phabricator.kde.org/D10549
Comment 18 schrott3000 2018-02-16 10:07:08 UTC
Works well for me. The default size matches the ones in X11 windows which I think is a good default. 
If you have further questions please ask, otherwise feel free to change this bug to RESOLVED.
Comment 19 Martin Flöser 2018-03-04 08:43:32 UTC
Git commit 2ea5153e1c979760c56bd81f3d47f2fb373130c9 by Martin Flöser.
Committed on 04/03/2018 at 08:42.
Pushed by graesslin into branch 'Plasma/5.12'.

Don't crash if the cursor theme fails to create

Summary:
If the cursor theme failed to create KWin crashed due to an endless
recursion. There are two reasons for this fault:
1) When the physical size does not exist we perform a division by 0
which results in an invalid size going into wl_cursor_theme_load
2) We emit the signal that the cursor theme changed even if it didn't
change thus creating an endless recursion

This change addresses both problems: it checks that the size is not 0
and changes the handling for theme update to only destroy the previous
theme if the new theme could be created and only emits the signal if
things change.
FIXED-IN: 5.12.3

Test Plan: Added a new test case which crashed with old code

Reviewers: #kwin, #plasma

Subscribers: plasma-devel, kwin

Tags: #plasma

Differential Revision: https://phabricator.kde.org/D10549

M  +1    -0    autotests/integration/CMakeLists.txt
A  +124  -0    autotests/integration/dont_crash_cursor_physical_size_empty.cpp     [License: GPL (v2)]
M  +17   -9    wayland_cursor_theme.cpp

https://commits.kde.org/kwin/2ea5153e1c979760c56bd81f3d47f2fb373130c9