476479 – baloo_file (baloo_file), signal: Aborted

Bug 476479 - baloo_file (baloo_file), signal: Aborted

Summary: baloo_file (baloo_file), signal: Aborted

Status:	RESOLVED FIXED

Alias:	None

Product:	frameworks-baloo
Classification:	Frameworks and Libraries
Component:	Baloo File Daemon (other bugs)
Version First Reported In:	5.111.0
Platform:	Gentoo Packages Linux

Importance:	NOR crash
Target Milestone:	---
Assignee:	baloo-bugs-null

URL:
Keywords:

Depends on:
Blocks:

Reported:	2023-11-02 19:46 UTC by David Kredba
Modified:	2025-01-06 08:50 UTC (History)
CC List:	2 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description David Kredba 2023-11-02 19:46:24 UTC

SUMMARY
I am indexing from scratch, it takes terrible ages (over one month already, 14 hours a day, EXT4 on LUKS2 on 
MDRAID on rotating HDDs to index file stored on EXT4 on LUKS2 on SSD) and came home to see a KCrash handler icon.
There is still 50 GiB of free space on SSD storing the index file.
The 'balooctl status' command returns:
Baloo File Indexer is not running
Total files indexed: 716,213
Files waiting for content indexing: 151,027
Files failed to index: 0
Current size of index is 67.41 GiB


OBSERVED RESULT
#6  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#7  0x00007f29a60b3b6f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#8  0x00007f29a6063a02 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#9  0x00007f29a604c22d in __GI_abort () at abort.c:79
#10 0x00007f29a604d29c in __libc_message (fmt=fmt@entry=0x7f29a619d0da "%s\n") at ../sysdeps/posix/libc_fatal.c:150
#11 0x00007f29a60bd975 in malloc_printerr (str=str@entry=0x7f29a61a07f0 "mremap_chunk(): invalid pointer") at malloc.c:5765
#12 0x00007f29a60c2cec in mremap_chunk (new_size=48, p=0x7f29a68663d0 <prime_deltas+16>) at malloc.c:3063
#13 __GI___libc_realloc (oldmem=0x7f29a68663e0 <QListData::shared_null>, bytes=32) at malloc.c:3473
#14 0x00007f29a660c463 in QListData::realloc_grow (this=this@entry=0x55da3b0a3a80, growth=growth@entry=1) at /var/tmp/portage/dev-qt/qtcore-5.15.11-r1/work/qtbase-everywhere-src-5.15.11/src/corelib/tools/qlist.cpp:170
#15 0x00007f29a660c50a in QListData::append (this=0x55da3b0a3a80, n=n@entry=1) at /var/tmp/portage/dev-qt/qtcore-5.15.11-r1/work/qtbase-everywhere-src-5.15.11/src/corelib/tools/qlist.cpp:196
#16 0x00007f29a660c53a in QListData::append (this=<optimized out>) at /var/tmp/portage/dev-qt/qtcore-5.15.11-r1/work/qtbase-everywhere-src-5.15.11/src/corelib/tools/qlist.cpp:206
#17 0x000055da39d34781 in QList<QString>::append (t=..., this=<optimized out>) at /usr/include/qt5/QtCore/qlist.h:643




SOFTWARE/OS VERSIONS
KDE Plasma Version: 5.27.9
KDE Frameworks Version: 5.111.0
Qt Version: 5.15.11
Kernel Version: 6.6.0-gentoo (64-bit)
Graphics Platform: Wayland
Processors: 16 × AMD Ryzen 7 3800X 8-Core Processor
Memory: 62.7 GiB of RAM
Graphics Processor: AMD Radeon RX 580 Series
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: X570 AORUS ELITE
System Version: -CF

Comment 1 David Kredba 2023-11-02 19:47:52 UTC

I am sorry, the complete backtrace follows:

Application: baloo_file (baloo_file), signal: Aborted
Content of s_kcrashErrorMessage: std::unique_ptr<char []> = {get() = 0x0}
[KCrash Handler]
#6  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#7  0x00007f29a60b3b6f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#8  0x00007f29a6063a02 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#9  0x00007f29a604c22d in __GI_abort () at abort.c:79
#10 0x00007f29a604d29c in __libc_message (fmt=fmt@entry=0x7f29a619d0da "%s\n") at ../sysdeps/posix/libc_fatal.c:150
#11 0x00007f29a60bd975 in malloc_printerr (str=str@entry=0x7f29a61a07f0 "mremap_chunk(): invalid pointer") at malloc.c:5765
#12 0x00007f29a60c2cec in mremap_chunk (new_size=48, p=0x7f29a68663d0 <prime_deltas+16>) at malloc.c:3063
#13 __GI___libc_realloc (oldmem=0x7f29a68663e0 <QListData::shared_null>, bytes=32) at malloc.c:3473
#14 0x00007f29a660c463 in QListData::realloc_grow (this=this@entry=0x55da3b0a3a80, growth=growth@entry=1) at /var/tmp/portage/dev-qt/qtcore-5.15.11-r1/work/qtbase-everywhere-src-5.15.11/src/corelib/tools/qlist.cpp:170
#15 0x00007f29a660c50a in QListData::append (this=0x55da3b0a3a80, n=n@entry=1) at /var/tmp/portage/dev-qt/qtcore-5.15.11-r1/work/qtbase-everywhere-src-5.15.11/src/corelib/tools/qlist.cpp:196
#16 0x00007f29a660c53a in QListData::append (this=<optimized out>) at /var/tmp/portage/dev-qt/qtcore-5.15.11-r1/work/qtbase-everywhere-src-5.15.11/src/corelib/tools/qlist.cpp:206
#17 0x000055da39d34781 in QList<QString>::append (t=..., this=<optimized out>) at /usr/include/qt5/QtCore/qlist.h:643
#18 QList<QString>::append (this=<optimized out>, t=...) at /usr/include/qt5/QtCore/qlist.h:620
#19 0x000055da39d43ed9 in Baloo::FileContentIndexer::slotFinishedIndexingFile (this=0x55da3b0a3a40, filePath=..., fileUpdated=<optimized out>) at /var/tmp/portage/kde-frameworks/baloo-5.111.0/work/baloo-5.111.0/src/file/filecontentindexer.cpp:125
#20 0x00007f29a67b4024 in QObject::event (this=0x55da3b0a3a40, e=0x7ee99432ba20) at /var/tmp/portage/dev-qt/qtcore-5.15.11-r1/work/qtbase-everywhere-src-5.15.11/src/corelib/kernel/qobject.cpp:1347
#21 0x00007f29a6788f25 in doNotify (event=0x7ee99432ba20, receiver=0x55da3b0a3a40) at /var/tmp/portage/dev-qt/qtcore-5.15.11-r1/work/qtbase-everywhere-src-5.15.11/src/corelib/kernel/qcoreapplication.cpp:1154
#22 QCoreApplication::notify (event=<optimized out>, receiver=<optimized out>, this=<optimized out>) at /var/tmp/portage/dev-qt/qtcore-5.15.11-r1/work/qtbase-everywhere-src-5.15.11/src/corelib/kernel/qcoreapplication.cpp:1140
#23 QCoreApplication::notifyInternal2 (receiver=0x55da3b0a3a40, event=0x7ee99432ba20) at /var/tmp/portage/dev-qt/qtcore-5.15.11-r1/work/qtbase-everywhere-src-5.15.11/src/corelib/kernel/qcoreapplication.cpp:1064
#24 0x00007f29a678914e in QCoreApplication::sendEvent (receiver=<optimized out>, event=<optimized out>) at /var/tmp/portage/dev-qt/qtcore-5.15.11-r1/work/qtbase-everywhere-src-5.15.11/src/corelib/kernel/qcoreapplication.cpp:1462
#25 0x00007f29a678c4c3 in QCoreApplicationPrivate::sendPostedEvents (receiver=0x0, event_type=0, data=0x55da3b083b70) at /var/tmp/portage/dev-qt/qtcore-5.15.11-r1/work/qtbase-everywhere-src-5.15.11/src/corelib/kernel/qcoreapplication.cpp:1821
#26 0x00007f29a678c778 in QCoreApplication::sendPostedEvents (receiver=<optimized out>, event_type=<optimized out>) at /var/tmp/portage/dev-qt/qtcore-5.15.11-r1/work/qtbase-everywhere-src-5.15.11/src/corelib/kernel/qcoreapplication.cpp:1680
#27 0x00007f29a67db013 in postEventSourceDispatch (s=0x55da3b0853f0) at /var/tmp/portage/dev-qt/qtcore-5.15.11-r1/work/qtbase-everywhere-src-5.15.11/src/corelib/kernel/qeventdispatcher_glib.cpp:277
#28 0x00007f29a4f73d52 in g_main_dispatch (context=context@entry=0x55da3b085180) at ../glib-2.78.1/glib/gmain.c:3476
#29 0x00007f29a4f76f07 in g_main_context_dispatch_unlocked (context=0x55da3b085180) at ../glib-2.78.1/glib/gmain.c:4284
#30 g_main_context_iterate_unlocked (context=context@entry=0x55da3b085180, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib-2.78.1/glib/gmain.c:4349
#31 0x00007f29a4f7752c in g_main_context_iteration (context=0x55da3b085180, may_block=1) at ../glib-2.78.1/glib/gmain.c:4414
#32 0x00007f29a67dab16 in QEventDispatcherGlib::processEvents (this=0x55da3b085080, flags=...) at /var/tmp/portage/dev-qt/qtcore-5.15.11-r1/work/qtbase-everywhere-src-5.15.11/src/corelib/kernel/qeventdispatcher_glib.cpp:423
#33 0x00007f29a678797b in QEventLoop::exec (this=this@entry=0x7ffcd9102b80, flags=..., flags@entry=...) at /var/tmp/portage/dev-qt/qtcore-5.15.11-r1/work/qtbase-everywhere-src-5.15.11/include/QtCore/../../src/corelib/global/qflags.h:69
#34 0x00007f29a678fc7d in QCoreApplication::exec () at /var/tmp/portage/dev-qt/qtcore-5.15.11-r1/work/qtbase-everywhere-src-5.15.11/include/QtCore/../../src/corelib/global/qflags.h:121
#35 0x000055da39d33915 in main (argc=<optimized out>, argv=<optimized out>) at /var/tmp/portage/kde-frameworks/baloo-5.111.0/work/baloo-5.111.0/src/file/main.cpp:78
[Inferior 1 (process 5469) detached]

Comment 2 tagwerk19 2023-11-02 22:05:47 UTC

There's been a change to limit the amount of RAM baloo can use to 512M (assuming you are on a system with systemd). See:
    https://bugs.kde.org/show_bug.cgi?id=446071#c9
Without a cap on memory, baloo can expand and slug the system performance. With the cap, you might find baloo starting to use swap when doing large write transactions. Also not good.

I've been setting MemoryHigh=50% and MemorySwapMax=0B to find a "middle way". Your mileage may vary

You can watch the files being indexed with "balooctl monitor", you should see them indexed in batches of 40.

You say "KDE Frameworks Version: 5.111.0", did you update the system halfway through the indexing? You also say "Current size of index is 67.41 GiB" which doesn't sound healthy.

Comment 3 David Kredba 2023-11-03 08:42:50 UTC

My system uses OpenRC.
I started the indexing on KF 5.110, during the compilation it was paused.
I am definitely not going to try to really use it (but I would like to!) if I will be told that after each KF5/6 upgrade touching Baloo it will need re-indexing. That would be terrible waste of energy and time. It should be programed the way that it will make needed internal changes of existing index file after each incompatible upgrade of Baloo internals.
I am having plenty of literature in pdf and epub formats but was wrongly thinking it maybe a right size but from the output of the 'balooctl indexSize' command it may really suggest it went crazy again:
File Size: 67,41 GiB
Used:      2,95 GiB

PostingDB:       3,47 GiB   117.860 %
PositionDB:       1,76 GiB    59.766 %
DocTerms:       1,51 GiB    51.211 %
DocFilenameTerms:      48,49 MiB     1.606 %
DocXattrTerms:       4,00 KiB     0.000 %
IdTree:       9,99 MiB     0.331 %
IdFileName:      52,75 MiB     1.747 %
DocTime:      28,57 MiB     0.946 %
DocData:      49,64 MiB     1.645 %
ContentIndexingDB:       4,21 MiB     0.140 %
FailedIdsDB:            0 B     0.000 %
MTimeDB:      13,07 MiB     0.433 %

I will try to reduce its size using the command 'mdb_copy -n -c index index.new'.

Yes, it does it in 40 pcs batches.

Thank you.

Comment 4 David Kredba 2023-11-03 09:08:23 UTC

mdb_copy -n downsized it to 31 GiB.

Comment 5 tagwerk19 2023-11-03 15:21:10 UTC

> My system uses OpenRC. 
Don't know whether OpenRC gives you a way of limiting the memory use (with cgroups?). I only know the systemd unit files. Putting some sort of cap on the memory use is sensible. 

> ... told that after each KF5/6 upgrade touching Baloo it will need re-indexing ...
Probably more complicated. Previously if could be that when you mount disks on a reboot, they get a different device number each time. This was a clear issue with BTRFS if you have multple subvolumes, there was a race and disks came up with different minor device numbers. OK, "previously" applies to Baloo. Baloo used to rely on the device number (device number and inode) to build an internal DocID for each file it indexed. If the device number changed on a reboot then Baloo thought it had a whole set of new files and indeed them all again. Bad.

This may also be happening with your Ext4/LUKS2 setup. I'm afraid I don't know how this presents itself to the system.

With Frameworks 5.111 there's been a patch to use an "unvariant" File system ID (rather than the minor device number). This means there will be "one more" reindexing and then the index should be stable. It shouldn't be every KF5/KF6 change, it should be more stable after this one...

    https://invent.kde.org/frameworks/baloo/-/merge_requests/131
    https://discuss.kde.org/t/baloo-and-frameworks-5-111/6348

You can keep watch on the device number / inode on disk with "stat filename", see how Baloo has indexed it with "balooshow -x filename" and also check for "multiple hits" for the same file if you do a "baloosearch -i filename".

There's also a possible "gotcha" that happens if you are worried about how the indexing is going and watch with "balooctl status". This counts the files waiting to be indexed - and holds the index "read only" when it's doing it. If baloo_file/baoo_file_extractor wants to write at that moment, the write is an append. Suddenly the index is bigger (Bug 437754)

> ... It should be programed the way that it will make needed internal changes of existing index file after each incompatible upgrade of Baloo internals ...
Not sure there's a watertight way of doing this - beyond keeping a hash of the files and comparing.

> ... I am having plenty of literature in pdf and epub formats ...
These can sometimes be slow to index, each file need to be read as a stream of text. PDF's can be compressed and things like graphs can take a *load* of CPU to render....
        
Not sure whether this all helps.

Probably the thing to do it to check what "stat" says for your files; change the indexing "includes" so you can see what happens with a small set of folder; pkill baloo_file and purge the index. Sorry.

Comment 6 tagwerk19 2023-11-04 07:38:57 UTC

> ... Crash ...
I think you are off in the wildlands with:
> Current size of index is 67.41 GiB
and
> Memory: 62.7 GiB of RAM

I'd be 95% sure that the root cause is reindexing (possibly historically, anyway with 5.111)

Comment 7 David Kredba 2023-11-04 20:52:25 UTC

Both the mdb_copy and mdb_stat commands understand the index file structure and Baloo continues to index content of my files
without complaining so I think its structure was not damaged by the crash. I will let it go.
After it (maybe) will finish it I can run some searches to see if documents are found once only.
Can I dump/export it to text file? There are a mdb_dump and mdb_load binaries. I will try.
Are there sanity/health checks for the index file BTW?

Comment 8 David Kredba 2023-11-05 08:57:43 UTC

(In reply to tagwerk19 from comment #6)

You were right, it is doubled. I do have 363696 regular files in home folder.
(cd &&  find . \( ! -regex '.*/\..*' \) -type f | wc -l)

Stopped, purged, logged off/on, started, got again surprised that it asks for Computer restart after enabling it (at least in GUI of Settings).

Comment 9 tagwerk19 2023-11-05 12:40:36 UTC

(In reply to David Kredba from comment #8)
> ... it asks for Computer restart after enabling it (at least in GUI of Settings) ...
No a reboot is not needed, a restart of the Baloo process is sufficient. Sometimes though Baloo isn't listening when you "ask it to restart itself". Bug 467531.
    
> You were right, it is doubled. I do have 363696 regular files in home folder.
> (cd &&  find . \( ! -regex '.*/\..*' \) -type f | wc -l)
Thank you, it's something of a relief that Ext4 over LUKS2 did give a stable device number. I think now a question of patience, watching with "balooctl monitor" (and I've found iotop gives a nice view into the indexing behaviour). Avoid "balooctl status" if you can...

Comment 10 tagwerk19 2023-11-05 21:57:55 UTC

(In reply to David Kredba from comment #7)
> Are there sanity/health checks for the index file BTW?
Sorry, I missed that one. Yes. Igor Poboiko has a
    
    baloo-checkdb.py

script here:

    https://invent.kde.org/frameworks/baloo/uploads/bdc9f5f17fc96490b7bd4a22ac664843/baloo-checkdb.py

With a quick description:

    https://invent.kde.org/frameworks/baloo/-/merge_requests/87#note_535270

It works "one level up"; the logical DB structure rather then the physical one. Heads up about the need to load the whole database into memory!  I've used it for small indexes.
        
It probably needs a new release though, see the report here
    
    https://bugs.kde.org/show_bug.cgi?id=474973#c29

Comment 11 tagwerk19 2024-07-07 15:20:25 UTC

(In reply to tagwerk19 from comment #9)
> ... it's something of a relief that Ext4 over LUKS2 did give a stable
> device number. I think now a question of patience, watching with "balooctl
> monitor" (and I've found iotop gives a nice view into the indexing
> behaviour). Avoid "balooctl status" if you can...
Are you OK now?

I know there's been some fixes to Poppler that have fixed some extreme PDF issues
    https://bugs.kde.org/show_bug.cgi?id=380456#c22

Comment 12 David Kredba 2025-01-06 08:50:02 UTC

Baloo seems to be fine after I excluded my many many epub books folder.
Pdf books are indexed by Baloo in a different Calibre Library fine.
(Calibre's full-text index file of that epub Library is over 20 GiB in size.)