Bug 466229

Summary: Spider segfaulting, new solver (417bdc2ec) bug?
Product: [Applications] kpat Reporter: Duncan <1i5t5.duncan>
Component: solverAssignee: Stephan Kulow <coolo>
Status: RESOLVED FIXED    
Severity: crash CC: kde-games-bugs-null
Priority: NOR    
Version First Reported In: unspecified   
Target Milestone: ---   
Platform: Gentoo Packages   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:

Description Duncan 2023-02-22 05:17:29 UTC
Live-git version updated yesterday (Feb 20, 2023), using the gentoo/kde project overlay's live-git packages, reported as 23.03.70 (both unlisted so I can't set that version above), with live-git frameworks-5 as well.  Qt is 5.15.8 (current as of yesterday with gentoo's regular sub-version update patchsets) .

Spider has started segfaulting on me recently (the other kpat games I play regularly, klondike and freecell, are fine), with several weeks of further updates not fixing it yet.  The timing and spider-specificness suggest it's due to a race or nul-deref in the new solver, tho I'm filing this with the info I have before bisecting to confirm that.

Maybe happenstance, but at first it /seemed/ to happen nearly immediately after starting a spider round, often on (before ?) the first move, but something (maybe just another rebuild, or an update of libkdegames, some framework, or qt5, as I see no further kpat code changes gitlogged, only l10n/appstream) seems to have decreased the frequency of the issue since and I can often get in a number of moves now before the segfault.  Once I was even able to finish a round and I hoped for a moment the problem was fixed, but it segfaulted on the second round.

The output when started from konsole is unhelpful; an initial complaint from QFont::setPixelSize that appears unrelated as it happens well before the segfault, then nothing as I play until the segfault:

QFont::setPixelSize: Pixel size <= 0 (0)
Segmentation fault


For reference here's the suspect commit (email addresses despammed for posting):

* commit 417bdc2ec
| Author:     Stephan Kulow <stephan@despammed>
| AuthorDate: Fri Jan 20 10:51:07 2023 +0100
| Commit:     Albert Astals Cid <aacid@despammed>
| CommitDate: Sun Jan 22 22:33:51 2023 +0000
|
|     Replace spider solver with a self serving solution
|
|     This is something I had lying around (don't ask). It's generally
|     slower (as it tries harder to find an optimal solution), but finds
|     solutions more reliably - and is possibly easier to debug for
|     other people :)
Comment 1 Duncan 2023-02-22 05:44:38 UTC
Bisecting confirms it's the new solver.  417bdc2ec (the new solver) bad.  3c581787e (the commit previous to that) seems to be fine. (Well, at least it played a full round without segfaulting, tho as I mentioned above that did happen -- once -- with the bad code, too.)
Comment 2 Stephan Kulow 2023-02-22 06:19:21 UTC
Hi, can you export KDE_DEBUG=1 before starting it from console? Then something useful should appear in coredumpctl (typing from memory as I'm not having a computer around).
Comment 3 Duncan 2023-02-24 03:50:25 UTC
(In reply to Stephan Kulow from comment #2)
> Hi, can you export KDE_DEBUG=1 before starting it from console? Then
> something useful should appear in coredumpctl (typing from memory as I'm not
> having a computer around).

Kernel CONFIG_COREDUMP=n, so coredumpctl never gets the cores from the kernel.  I could change that if necessary (tho would rather not deal with figuring out all that config), but what about running via gdb instead?

Meanwhile , dmesg does say SolverThread .... SIGSEGV ... error 6 in kpat ...

Currently the gdb likely isn't much help due to stripped binaries, but here's what I get ATM...

A bunch of New Thread ... Thread exited pairs, apparently one per move, I guess for the solver threads.  Then at the SIGSEGV...

[New Thread 0x7fffd99fc6c0 (LWP 13533)]

Thread 381 "SolverThread" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd99fc6c0 (LWP 13533)]
0x00005555555d762e in ?? ()
(gdb) bt
#0  0x00005555555d762e in ?? ()
#1  0x00005555555d8546 in ?? ()
#2  0x00005555555d9726 in ?? ()
#3  0x00005555555da057 in ?? ()
#4  0x000055555558eab7 in ?? ()
#5  0x00007ffff64cb05f in ?? () from /usr/lib64/libQt5Core.so.5
#6  0x00007ffff5ee042d in ?? () from /usr/lib64/libc.so.6
#7  0x00007ffff5f5943c in ?? () from /usr/lib64/libc.so.6
(gdb)

I'll have to look up again how to build unstripped, with debugging enabled  (IIRC gentoo/kde has a guide I'll need to reread, I've done it once before for something and it wasn't difficult) to fill in the ??s.  Hopefully this weekend...
Comment 4 Duncan 2023-02-27 02:27:22 UTC
(In reply to Duncan from comment #3)
> I'll have to look up again how to build unstripped, with debugging enabled 

This is more like it.  (The -9999 bit is gentoo's normal live-git package version-numbering.)

Again, a bunch of new-thread/thread-exited per-move as the solver-thread starts and exits, then...

[New Thread 0x7fffd9da96c0 (LWP 74954)]

Thread 159 "SolverThread" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd9da96c0 (LWP 74954)]
0x00007ffff5ef3c0b in memmove () from /usr/lib64/libc.so.6
(gdb) bt
#0  0x00007ffff5ef3c0b in memmove () from /usr/lib64/libc.so.6
#1  0x00005555555c060e in memcpy (__len=<optimized out>, __src=0x7fffb80ce700, __dest=0x7fffa4020ea8) at /include/bits/string_fortified.h:29
#2  Deck::update (this=this@entry=0x7fffa4020e28, other=other@entry=0x7fffb80ce680) at ../kpat-9999/src/patsolve/spidersolver2.cpp:656
#3  0x00005555555c169e in Deck::applyMove (this=this@entry=0x7fffb80ce680, m=..., newdeck=...) at ../kpat-9999/src/patsolve/spidersolver2.cpp:677
#4  0x00005555555c1bfd in Deck::shortestPath (this=<optimized out>, cap=cap@entry=150) at ../kpat-9999/src/patsolve/spidersolver2.cpp:802
#5  0x00005555555c1eaa in SpiderSolver2::patsolve (this=0x555556e473f0, max_positions=-1) at ../kpat-9999/src/patsolve/spidersolver2.cpp:941
#6  0x0000555555586979 in SolverThread::run (this=0x555555dddfe0) at ../kpat-9999/src/dealer.cpp:157
#7  0x00007ffff64cb05f in ?? () from /usr/lib64/libQt5Core.so.5
#8  0x00007ffff5ee042d in ?? () from /usr/lib64/libc.so.6
#9  0x00007ffff5f5943c in ?? () from /usr/lib64/libc.so.6
(gdb)

glibc-2.36-r7 (r7 being the gentoo package revision), gcc-12.2.1_p20230121-r1, qtcore-5.15.8-r3.

For the debug I built kpat with C(XX)FLAGS="-ggdb -Og", which I'll leave in place for the moment in case you need something beyond the simple bt.
Comment 5 Stephan Kulow 2023-02-27 07:26:08 UTC
I will have to build myself with fortified to see what you see, but it's likely visible in valgrind as well - which is unfortunately too noisy for me at the moment (complaining about glibc and X11 even before kpat code runs).
Comment 6 Stephan Kulow 2023-02-27 08:54:02 UTC
FORTIFY won't make a difference - and valgrind is silent.

Can you please run kpat --solve 15 --end 1000 (this is two suits variant). What variant are you playing anyway?
Comment 7 Stephan Kulow 2023-02-27 09:02:58 UTC
I managed to trigger it. I had to make my computer busy (your's is recompiling the world, right? :) and play in one suit variant and click like a maniac. 

==2602== Thread 9 SolverThread:
==2602== Invalid write of size 8
==2602==    at 0x484E41B: memmove (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==2602==    by 0x46C539: UnknownInlinedFun (string_fortified.h:29)
==2602==    by 0x46C539: Deck::update(Deck const*) (spidersolver2.cpp:656)
==2602==    by 0x46D526: UnknownInlinedFun (spidersolver2.cpp:677)
==2602==    by 0x46D526: UnknownInlinedFun (spidersolver2.cpp:802)
==2602==    by 0x46D526: SpiderSolver2::patsolve(int) (spidersolver2.cpp:941)
==2602==    by 0x423B86: SolverThread::run() (dealer.cpp:157)
==2602==    by 0x5EE5E3C: QThreadPrivate::start(void*) (qthread_unix.cpp:330)
==2602==    by 0x675698C: start_thread (in /usr/lib64/libc.so.6)
==2602==    by 0x67DC343: clone (in /usr/lib64/libc.so.6)
==2602==  Address 0x52dbf480 is 34,776,128 bytes inside a block of size 34,779,040 in arena "client"
==2602==
Comment 8 Stephan Kulow 2023-02-27 09:07:25 UTC
./bin/kpat --solve 14 --start 5 will reproduce the crash - I just had bad luck before.
Comment 10 Stephan Kulow 2023-03-09 05:18:59 UTC
merged. Please give git a test.