Bug 364748 - baloo_file_extractor segfault at startup of baloo_file (odfextractor segfault)
Summary: baloo_file_extractor segfault at startup of baloo_file (odfextractor segfault)
Status: RESOLVED FIXED
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: Baloo File Daemon (show other bugs)
Version: 5.23.0
Platform: Debian unstable Linux
: NOR crash
Target Milestone: ---
Assignee: Christoph Cullmann
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-06-25 13:12 UTC by kdeuser56
Modified: 2016-09-11 13:12 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description kdeuser56 2016-06-25 13:12:45 UTC
I noticed that for a long time baloo has only index half of my home directory:

user@debian:~/baloo$ balooctl status
Baloo File Indexer is running
Indexer state: Indexing file content
Indexed 7832 / 15848 files
Current size of index is 122.47 MiB

Resetting baloo (logging out, deleting all baloo related file) yields to indexing from scratch, but it stops around the same file count and does not index any further.
Stopping, restarting, resuming the file indexer does not help at all.
Everytime baloo_file gets started it immediately segfaults, so I tried to debug the problem:


user@debian:~/baloo$ gdb /usr/bin/baloo_file
GNU gdb (Debian 7.11.1-2) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/baloo_file...Reading symbols from /usr/lib/debug/.build-id/cc/6382b1b4146efbd90e5836d11546a4568d809d.debug...done.
done.
(gdb) set follow-fork-mode child
(gdb) run
Starting program: /usr/bin/baloo_file 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffec961700 (LWP 3073)]
org.kde.baloo: "/home/user"
[New Thread 0x7ffea7dfe700 (LWP 3074)]
Power state changed
[New process 3075]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
process 3075 is executing new program: /usr/bin/baloo_file_extractor
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffead9c700 (LWP 3076)]
[New Thread 0x7fffd79f0700 (LWP 3077)]

Thread 2.1 "baloo_file_extr" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7e528c0 (LWP 3075)]
KFileMetaData::OdfExtractor::extract (this=<optimized out>, result=0x7fffffffd870) at /build/kfilemetadata-kf5-l1YepB/kfilemetadata-kf5-5.23.0/src/extractors/odfextractor.cpp:133
133     /build/kfilemetadata-kf5-l1YepB/kfilemetadata-kf5-5.23.0/src/extractors/odfextractor.cpp: No such file or directory.
(gdb) thread apply all bt

Thread 2.3 (Thread 0x7fffd79f0700 (LWP 3077)):
#0  0x00007ffff5625dcd in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff428e39c in g_main_context_poll (priority=2147483647, n_fds=3, fds=0x7fffd0003020, timeout=<optimized out>, context=0x7fffd0000990) at /build/glib2.0-wnDt2X/glib2.0-2.48.1/./glib/gmain.c:4135
#2  g_main_context_iterate (context=context@entry=0x7fffd0000990, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at /build/glib2.0-wnDt2X/glib2.0-2.48.1/./glib/gmain.c:3835
#3  0x00007ffff428e4ac in g_main_context_iteration (context=0x7fffd0000990, may_block=may_block@entry=1) at /build/glib2.0-wnDt2X/glib2.0-2.48.1/./glib/gmain.c:3901
#4  0x00007ffff5f411af in QEventDispatcherGlib::processEvents (this=0x7fffd00008c0, flags=...) at kernel/qeventdispatcher_glib.cpp:417
#5  0x00007ffff5ee9e4a in QEventLoop::exec (this=this@entry=0x7fffd79efcd0, flags=..., flags@entry=...) at kernel/qeventloop.cpp:204
#6  0x00007ffff5d129e4 in QThread::exec (this=this@entry=0x7ffff7fd8d40 <(anonymous namespace)::Q_QGS__q_manager::innerFunction()::holder>) at thread/qthread.cpp:500
#7  0x00007ffff7f65515 in QDBusConnectionManager::run (this=0x7ffff7fd8d40 <(anonymous namespace)::Q_QGS__q_manager::innerFunction()::holder>) at qdbusconnection.cpp:189
#8  0x00007ffff5d17808 in QThreadPrivate::start (arg=0x7ffff7fd8d40 <(anonymous namespace)::Q_QGS__q_manager::innerFunction()::holder>) at thread/qthread_unix.cpp:341
#9  0x00007ffff511b464 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007ffff562ee5d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 2.2 (Thread 0x7fffead9c700 (LWP 3076)):
#0  0x00007ffff5625dcd in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff20c9382 in ?? () from /usr/lib/x86_64-linux-gnu/libxcb.so.1
#2  0x00007ffff20caff7 in xcb_wait_for_event () from /usr/lib/x86_64-linux-gnu/libxcb.so.1
#3  0x00007fffeccd4a89 in QXcbEventReader::run (this=0x664600) at qxcbconnection.cpp:1325
#4  0x00007ffff5d17808 in QThreadPrivate::start (arg=0x664600) at thread/qthread_unix.cpp:341
#5  0x00007ffff511b464 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#6  0x00007ffff562ee5d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 2.1 (Thread 0x7ffff7e528c0 (LWP 3075)):
#0  KFileMetaData::OdfExtractor::extract (this=<optimized out>, result=0x7fffffffd870) at /build/kfilemetadata-kf5-l1YepB/kfilemetadata-kf5-5.23.0/src/extractors/odfextractor.cpp:133
#1  0x0000000000408750 in Baloo::App::index (this=this@entry=0x7fffffffdfb0, tr=0x6e8030, url=..., id=id@entry=306226873237544) at /build/baloo-kf5-sHKQba/baloo-kf5-5.23.0/src/file/extractor/app.cpp:163
#2  0x0000000000408bfe in Baloo::App::processNextFile (this=0x7fffffffdfb0) at /build/baloo-kf5-sHKQba/baloo-kf5-5.23.0/src/file/extractor/app.cpp:93
#3  0x00007ffff5f24ca6 in QtPrivate::QSlotObjectBase::call (a=0x7fffffffd9e0, r=<optimized out>, this=<optimized out>) at ../../include/QtCore/../../src/corelib/kernel/qobject_impl.h:124
#4  QSingleShotTimer::timerEvent (this=0x79cf20) at kernel/qtimer.cpp:310
#5  0x00007ffff5f19523 in QObject::event (this=0x79cf20, e=<optimized out>) at kernel/qobject.cpp:1278
#6  0x00007ffff6823afc in QApplicationPrivate::notify_helper (this=<optimized out>, receiver=0x79cf20, e=0x7fffffffdca0) at kernel/qapplication.cpp:3804
#7  0x00007ffff6829036 in QApplication::notify (this=0x7fffffffdf80, receiver=0x79cf20, e=0x7fffffffdca0) at kernel/qapplication.cpp:3561
#8  0x00007ffff5eec0f8 in QCoreApplication::notifyInternal2 (receiver=0x79cf20, event=event@entry=0x7fffffffdca0) at kernel/qcoreapplication.cpp:1015
#9  0x00007ffff5f4009e in QCoreApplication::sendEvent (event=0x7fffffffdca0, receiver=<optimized out>) at ../../include/QtCore/../../src/corelib/kernel/qcoreapplication.h:225
#10 QTimerInfoList::activateTimers (this=0x683e00) at kernel/qtimerinfo_unix.cpp:637
#11 0x00007ffff5f40609 in timerSourceDispatch (source=<optimized out>) at kernel/qeventdispatcher_glib.cpp:176
#12 idleTimerSourceDispatch (source=<optimized out>) at kernel/qeventdispatcher_glib.cpp:223
#13 0x00007ffff428e1a7 in g_main_dispatch (context=0x7fffe40016f0) at /build/glib2.0-wnDt2X/glib2.0-2.48.1/./glib/gmain.c:3154
#14 g_main_context_dispatch (context=context@entry=0x7fffe40016f0) at /build/glib2.0-wnDt2X/glib2.0-2.48.1/./glib/gmain.c:3769
#15 0x00007ffff428e400 in g_main_context_iterate (context=context@entry=0x7fffe40016f0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at /build/glib2.0-wnDt2X/glib2.0-2.48.1/./glib/gmain.c:3840
#16 0x00007ffff428e4ac in g_main_context_iteration (context=0x7fffe40016f0, may_block=may_block@entry=1) at /build/glib2.0-wnDt2X/glib2.0-2.48.1/./glib/gmain.c:3901
#17 0x00007ffff5f411af in QEventDispatcherGlib::processEvents (this=0x688c70, flags=...) at kernel/qeventdispatcher_glib.cpp:417
#18 0x00007ffff5ee9e4a in QEventLoop::exec (this=this@entry=0x7fffffffdef0, flags=..., flags@entry=...) at kernel/qeventloop.cpp:204
#19 0x00007ffff5ef250c in QCoreApplication::exec () at kernel/qcoreapplication.cpp:1285
#20 0x00007ffff623381c in QGuiApplication::exec () at kernel/qguiapplication.cpp:1607
#21 0x00007ffff6820ac5 in QApplication::exec () at kernel/qapplication.cpp:2979
#22 0x0000000000407bc5 in main (argc=1, argv=0x7fffffffe1a8) at /build/baloo-kf5-sHKQba/baloo-kf5-5.23.0/src/file/extractor/main.cpp:57


It's obvious that baloo_file gets stuck on some file in my home directory ... though I did not manage to find out which one, I do not have any odf files, at least to my knowledge.

Is there some way to find out which file is the problematic one?

I think baloo_file should be implemeted in a way so that one file can not stop the whole indexing. Is this a direction planned for the future?

Reproducible: Always

Steps to Reproduce:
1. run baloo_file
2. notice crash in dmesg: baloo_file_extr[3240]: segfault at 0 ip 00007ff512bb73d6 sp 00007ffe5cc07de0 error 4 in kfilemetadata_odfextractor.so[7ff512bb4000+5000]
Comment 1 kdeuser56 2016-06-25 20:16:31 UTC
Last few lines in the "strace -f -e open baloo_file" output:

[pid  5606] open("/home/user/maple/maple2015/data/xml/dtd/mathml2/mathml2-qname-1.mod", O_RDONLY|O_CLOEXEC) = 21
[pid  5606] open("/home/user/maple/maple2015/data/eBookTools/Preface.mw", O_RDONLY|O_CLOEXEC) = 21
[pid  5606] open("/home/user/maple/maple2015/data/eBookTools/Legal.mw", O_RDONLY|O_CLOEXEC) = 21
[pid  5606] open("/home/user/maple/maple2015/data/help/Optimization/afiro.mpl", O_RDONLY|O_CLOEXEC) = 21
[pid  5606] open("/home/user/maple/maple2015/data/help/ImportData/recipe.mps", O_RDONLY|O_CLOEXEC) = 21
[pid  5606] open("/home/user/maple/maple2015/data/xml/template/template.ods", O_RDONLY|O_CLOEXEC) = 21
[pid  5606] open("/home/user/maple/maple2015/data/xml/template/template.ods", O_RDONLY|O_CLOEXEC) = 21
[pid  5606] open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 23
[pid  5606] open("/etc/group", O_RDONLY|O_CLOEXEC) = 23
[pid  5606] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0} ---
[pid  5607] +++ killed by SIGSEGV +++
[pid  5608] +++ killed by SIGSEGV +++
[pid  5606] +++ killed by SIGSEGV +++
[pid  5605] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=5606, si_uid=1000, si_status=SIGSEGV, si_utime=4, si_stime=5} ---

So I guess the offending file maybe "maple/maple2015/data/xml/template/template.ods".

How to verify that? How can I invoke baloo_file_extractor manually?
Comment 2 kdeuser56 2016-06-25 20:37:44 UTC
Confirmed the offending file by creating a new user and copying that file in the new users home directory. Once that happend baloo_file_extractor segfaults there too.

Since this document is part of a copyrighted proprietary application I can't share that file publicly.
Comment 3 kdeuser56 2016-06-25 21:55:30 UTC
By adding "*.ods" to the excludes in the config I finally manged the following status:

Baloo File Indexer is running
Indexer state: Indexing file content
Indexed 15835 / 15835 files
Current size of index is 264.27 MiB

:-)), so at least in my case no other crash happened.
Comment 4 Christoph Cullmann 2016-09-11 11:19:10 UTC
odf indexer has problems if some files are not in the zip, perhaps that is your issue, see:

https://git.reviewboard.kde.org/r/128886/
Comment 5 Christoph Cullmann 2016-09-11 13:12:52 UTC
Fixed

https://quickgit.kde.org/?p=kfilemetadata.git&a=commit&h=40730d75397aefb92145f86fc6abc9b303c56cfe

Make odf indexer more error prove, check if the files are there (and are files at all) (meta.xml + content.xml) 

REVIEW: 128886 
BUG 364748 

=> if you download this odt's to indexed directories your baloo will die on each index, be careful 
autotests/odfextractortest.cpp		blob | diff | history | plain
autotests/odfextractortest.h		blob | diff | history | plain
autotests/samplefiles/test_missing_content.odt	[ new file with mode 0644 ]	blob | plain
autotests/samplefiles/test_missing_meta.odt	[ new file with mode 0644 ]	blob | plain
src/extractors/odfextractor.cpp		blob | diff | history | plain