Bug 364475

Summary: baloo_file_extractor crashes whenever the index exceeds 5G in size
Product: [Frameworks and Libraries] frameworks-baloo Reporter: Hao Zhang <theivorytower>
Component: Baloo File DaemonAssignee: Christoph Cullmann <christoph>
Status: RESOLVED FIXED    
Severity: crash CC: christoph, don.waterloo+kde, kredba, marvin24, pinak.ahuja, vindrg
Priority: NOR    
Version: 5.23.0   
Target Milestone: ---   
Platform: Arch Linux   
OS: Linux   
Latest Commit: Version Fixed In: 5.27
Sentry Crash Report:
Attachments: Patch to fix this bug

Description Hao Zhang 2016-06-18 23:00:04 UTC
During indexing the files, if the size of .local/share/baloo/index exceeds 5G, baloo_file_extractor always crashes. The indexer can no longer index any new files after this point. I have reproduced the same crash with many different sets of files.

Reproducible: Always

Steps to Reproduce:
1. make sure baloo_file and baloo_file_extractor is not running
2. delete .config/baloo* and .local/share/baloo/ to begin with a clean state
3. In system settings, exclude all the directories from be indexed, except one directory to be indexed
4. balooctl start and balooctl check
5. wait for the indexing to finish

Actual Results:  
baloo_file_extractor always crashes when the index reaches ~5G in size. I have reproduced the same crash with many different sets of files (all pdfs and epubs, there is no overlap between the different sets of files), so I'm sure the crash is not due to any particular file, and only due to the size of the index. A typical trackback is as follows:
Process 1934 (baloo_file_extr) of user 1000 dumped core.
                                               
                                               Stack trace of thread 1934:
                                               #0  0x00007f93382b989d n/a (libKF5BalooEngine.so.5)
                                               #1  0x00007f93382b86dd n/a (libKF5BalooEngine.so.5)
                                               #2  0x00007f933829c074 _ZN5Baloo10PositionDB3getERK10QByteArray (libKF5BalooEngine.so.5)
                                               #3  0x00007f93382b5fcf _ZN5Baloo16WriteTransaction6commitEv (libKF5BalooEngine.so.5)
                                               #4  0x00007f93382aacb2 _ZN5Baloo11Transaction6commitEv (libKF5BalooEngine.so.5)
                                               #5  0x0000000000408c6c n/a (baloo_file_extractor)
                                               #6  0x00007f9336725ce6 n/a (libQt5Core.so.5)
                                               #7  0x00007f9336719ce3 _ZN7QObject5eventEP6QEvent (libQt5Core.so.5)
                                               #8  0x00007f933741906c _ZN19QApplicationPrivate13notify_helperEP7QObjectP6QEvent (libQt5Widgets.so.5)
                                               #9  0x00007f933741e4ff _ZN12QApplication6notifyEP7QObjectP6QEvent (libQt5Widgets.so.5)
                                               #10 0x00007f93366ebe70 _ZN16QCoreApplication15notifyInternal2EP7QObjectP6QEvent (libQt5Core.so.5)
                                               #11 0x00007f93367411ee _ZN14QTimerInfoList14activateTimersEv (libQt5Core.so.5)
                                               #12 0x00007f9336741749 n/a (libQt5Core.so.5)
                                               #13 0x00007f9334646dd7 g_main_context_dispatch (libglib-2.0.so.0)
                                               #14 0x00007f9334647040 n/a (libglib-2.0.so.0)
                                               #15 0x00007f93346470ec g_main_context_iteration (libglib-2.0.so.0)
                                               #16 0x00007f93367422cf _ZN20QEventDispatcherGlib13processEventsE6QFlagsIN10QEventLoop17ProcessEventsFlagEE (libQt5Core.so.5)
                                               #17 0x00007f93366ea02a _ZN10QEventLoop4execE6QFlagsINS_17ProcessEventsFlagEE (libQt5Core.so.5)
                                               #18 0x00007f93366f25bc _ZN16QCoreApplication4execEv (libQt5Core.so.5)
                                               #19 0x0000000000407acf n/a (baloo_file_extractor)
                                               #20 0x00007f9335d61741 __libc_start_main (libc.so.6)
                                               #21 0x0000000000407ba9 _start (baloo_file_extractor)
                                               
                                               Stack trace of thread 1935:
                                               #0  0x00007f9335e1f6cd poll (libc.so.6)
                                               #1  0x00007f93322418e0 n/a (libxcb.so.1)
                                               #2  0x00007f9332243679 xcb_wait_for_event (libxcb.so.1)
                                               #3  0x00007f932c618529 n/a (libQt5XcbQpa.so.5)
                                               #4  0x00007f9336514b38 n/a (libQt5Core.so.5)
                                               #5  0x00007f93354d7484 start_thread (libpthread.so.0)
                                               #6  0x00007f9335e286dd __clone (libc.so.6)
                                               
                                               Stack trace of thread 1936:
                                               #0  0x00007f9335e1f6cd poll (libc.so.6)
                                               #1  0x00007f9334646fd6 n/a (libglib-2.0.so.0)
                                               #2  0x00007f93346470ec g_main_context_iteration (libglib-2.0.so.0)
                                               #3  0x00007f93367422eb _ZN20QEventDispatcherGlib13processEventsE6QFlagsIN10QEventLoop17ProcessEventsFlagEE (libQt5Core.so.5)
                                               #4  0x00007f93366ea02a _ZN10QEventLoop4execE6QFlagsINS_17ProcessEventsFlagEE (libQt5Core.so.5)
                                               #5  0x00007f933650fc33 _ZN7QThread4execEv (libQt5Core.so.5)
                                               #6  0x00007f9337df4bf5 n/a (libQt5DBus.so.5)
                                               #7  0x00007f9336514b38 n/a (libQt5Core.so.5)
                                               #8  0x00007f93354d7484 start_thread (libpthread.so.0)
                                               #9  0x00007f9335e286dd __clone (libc.so.6)
hao /proc/1675/fd  $  ls


Expected Results:  
baloo_file_extractor should be able to index all the files I specify without crash

The crash always happens with both KDE frameworks 5.22 and 5.23. I'm on Qt 5.6.1.
Comment 1 Hao Zhang 2016-06-18 23:04:12 UTC
Can this bug be related to Bug 359038? Since info.me_last_pgno got reused. When the size of the index is 5G, "balooctl status" only shows a size of 1G.
Comment 2 Hao Zhang 2016-06-19 00:37:04 UTC
Please ignore my previous comment, since info.me_last_pgno was not really reused. But I solved  Bug 359038 and please see my comment there.
Comment 3 Hao Zhang 2016-06-19 01:05:47 UTC
When I compiled baloo in debug mode, I got the following error before baloo_file_extractor crashes:

ASSERT failure in PositionDB::put:  "MDB_MAP_FULL: Environment mapsize limit reached", file /home/hao/build/baloo/src/baloo-5.23.0/src/engine/positiondb.cpp, line 80
Comment 4 Hao Zhang 2016-06-19 15:22:15 UTC
The relevant code is in line 97 of the file src/engine/database.cpp

mdb_env_set_mapsize(m_env, static_cast<size_t>(1024) * 1024 * 1024 * 5); // 5 gb

The size of the database is HARDCODED to be 5gb!! Is there any specific reason this size cannot be larger? 5gb is far from enough if there are lots of files to index.
Comment 5 Hao Zhang 2016-06-21 01:31:11 UTC
Created attachment 99637 [details]
Patch to fix this bug

This bug is fixed with the attached patch.
Comment 6 Christoph Cullmann 2016-09-11 10:58:33 UTC
Thanks for the patch!
Cleaned it up a bit (e.g. no overflow in computation on 32-bit and usage of size_t)

https://git.reviewboard.kde.org/r/128885/

Feel free to comment there!
Comment 7 Christoph Cullmann 2016-09-11 11:20:52 UTC
*** Bug 364133 has been marked as a duplicate of this bug. ***
Comment 8 Christoph Cullmann 2016-09-11 13:29:47 UTC
*** Bug 352260 has been marked as a duplicate of this bug. ***
Comment 9 Christoph Cullmann 2016-09-11 13:53:51 UTC
*** Bug 359038 has been marked as a duplicate of this bug. ***
Comment 10 Christoph Cullmann 2016-09-11 16:58:32 UTC
Git commit b0890aca71aa4f0fdabe65ee7b7fbd0bc844d8b8 by Christoph Cullmann.
Committed on 11/09/2016 at 16:54.
Pushed by cullmann into branch 'master'.

Increase size limit of baloo index for 64-bit machines

CHANGELOG: On 64-bit systems baloo allows now > 5 GB index storage.

Increase size limit of baloo index for 64-bit machines to avoid crashs after > 5GB of index size.
(Better would be additional out-of-space handling, but ATM baloo has zero checks for that)

The size limit for 32-bit is still 1GB, like before (there was a silent overflow from 5GB to 1GB in the computation), people with large homes will still get random segfaults on 32-bit.

Patch based on patch from Hao Zhang, Bug 364475

REVIEW: 128885

M  +11   -1    src/engine/database.cpp
M  +14   -14   src/engine/databasesize.h
M  +1    -1    src/engine/transaction.cpp
M  +2    -2    src/tools/balooctl/statuscommand.cpp

http://commits.kde.org/baloo/b0890aca71aa4f0fdabe65ee7b7fbd0bc844d8b8
Comment 11 Christoph Cullmann 2016-09-11 17:00:03 UTC
Thanks for very useful error detection + patch proposal to Hao Zhang!
Comment 12 Vincas Dargis 2016-10-28 15:50:21 UTC
Will this fix land into Kubuntu 16.04?

Or should I create bug report in https://bugs.launchpad.net/ubuntu/+source/baloo for Kbuntu maintainers..?

Thanks.
Comment 13 marvin24 2016-10-28 16:27:34 UTC
It is fixed in 5.27. If this version is not in 16.04, Ubuntu has to do back port.

For some reason, it even crashes now a bit earlier here.
Comment 14 Vincas Dargis 2016-10-28 18:36:54 UTC
I've created bug report asking for backport. Posting for reference:
https://bugs.launchpad.net/ubuntu/+source/baloo/+bug/1637610