Bug 403902

Summary: Baloo_file_extractor Crashes in KFileMetaData::TagLibExtractor::extract() on XML files with the .spx extension
Product: [Frameworks and Libraries] frameworks-kfilemetadata Reporter: Laura David Hurka <laura.stern>
Component: generalAssignee: Pinak Ahuja <pinak.ahuja>
Status: RESOLVED FIXED    
Severity: crash CC: a.stippich, armandogarciasf, asturm, bruno, lagerimsi, laura.stern, nate, stefan.bruens, stream009
Priority: VHI Keywords: drkonqi
Version: 5.54.0   
Target Milestone: ---   
Platform: Ubuntu   
OS: Linux   
Latest Commit: Version Fixed In: 5.57
Sentry Crash Report:
Attachments: baloo_file_extractor-20190429-184351.kcrash.txt

Description Laura David Hurka 2019-02-03 22:06:50 UTC
Application: baloo_file_extractor (5.54.0)

Qt Version: 5.11.2
Frameworks Version: 5.54.0
Operating System: Linux 4.15.0-45-generic x86_64
Distribution: KDE neon User Edition 5.14

-- Information about the crash:
I saved an XML file with .spx somewhere under ~, after I created it with Kate. It was definitely not similar to speex audio (.spx).

Immediately at clicking Save the "Baloo Closed Unexpectedly" window popped up. Since then, it pops up after every login, although I have deleted all .spx files in the meantime.

-- Backtrace:
Application: Baloo File Extractor (baloo_file_extractor), signal: Segmentation fault
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Current thread is 1 (Thread 0x7fc9b4a42c80 (LWP 5386))]

Thread 3 (Thread 0x7fc98d04d700 (LWP 5402)):
#0  0x00007fc9b15ecbf9 in __GI___poll (fds=0x7fc988004db0, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007fc9adcb9539 in  () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007fc9adcb964c in g_main_context_iteration () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007fc9b1f2704b in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#4  0x00007fc9b1ecb30a in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#5  0x00007fc9b1cf6bba in QThread::exec() () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#6  0x00007fc9b384de45 in  () at /usr/lib/x86_64-linux-gnu/libQt5DBus.so.5
#7  0x00007fc9b1d01adb in  () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#8  0x00007fc9b01f36db in start_thread (arg=0x7fc98d04d700) at pthread_create.c:463
#9  0x00007fc9b15f988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7fc9a6479700 (LWP 5393)):
#0  0x00007fc9b15ecbf9 in __GI___poll (fds=0x7fc9a6478ca8, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007fc9ad848747 in  () at /usr/lib/x86_64-linux-gnu/libxcb.so.1
#2  0x00007fc9ad84a36a in xcb_wait_for_event () at /usr/lib/x86_64-linux-gnu/libxcb.so.1
#3  0x00007fc9a9439ed9 in  () at /usr/lib/x86_64-linux-gnu/libQt5XcbQpa.so.5
#4  0x00007fc9b1d01adb in  () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#5  0x00007fc9b01f36db in start_thread (arg=0x7fc9a6479700) at pthread_create.c:463
#6  0x00007fc9b15f988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7fc9b4a42c80 (LWP 5386)):
[KCrash Handler]
#6  0x00007fc98d4660ab in  () at /usr/lib/x86_64-linux-gnu/qt5/plugins/kf5/kfilemetadata/kfilemetadata_taglibextractor.so
#7  0x000055e8b1b8e22b in Baloo::App::index(Baloo::Transaction*, QString const&, unsigned long long) (this=this@entry=0x7ffcba0c07f0, tr=0x55e8b2f8f490, url=..., id=id@entry=2280361346205702) at ./src/file/extractor/app.cpp:191
#8  0x000055e8b1b8eb6e in Baloo::App::processNextFile() (this=0x7ffcba0c07f0) at ./src/file/extractor/app.cpp:111
#9  0x00007fc9b1f08ef4 in  () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#10 0x00007fc9b1efcb9b in QObject::event(QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#11 0x00007fc9b2c59e1c in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#12 0x00007fc9b2c613ef in QApplication::notify(QObject*, QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#13 0x00007fc9b1eccfe8 in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#14 0x00007fc9b1f264be in QTimerInfoList::activateTimers() () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#15 0x00007fc9b1f26c81 in  () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#16 0x00007fc9adcb9387 in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#17 0x00007fc9adcb95c0 in  () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#18 0x00007fc9adcb964c in g_main_context_iteration () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#19 0x00007fc9b1f2702f in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#20 0x00007fc9a94c5761 in  () at /usr/lib/x86_64-linux-gnu/libQt5XcbQpa.so.5
#21 0x00007fc9b1ecb30a in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#22 0x00007fc9b1ed44d0 in QCoreApplication::exec() () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#23 0x000055e8b1b8d204 in main(int, char**) (argc=<optimized out>, argv=0x7ffcba0c0a18) at ./src/file/extractor/main.cpp:60

Reported using DrKonqi
Comment 1 Alexander Stippich 2019-02-05 21:23:50 UTC
I can reproduce the crash in the KFileMetaData extractor. I think I tracked it down to a taglib bug https://github.com/taglib/taglib/issues/836.
Luckily, I am planning to port away from the buggy function anyways, eventually not causing the crash anymore.
Comment 2 Nate Graham 2019-02-06 19:06:38 UTC
*** Bug 403710 has been marked as a duplicate of this bug. ***
Comment 3 Nate Graham 2019-02-06 19:07:45 UTC
Got another report in Bug 403710. Looks like XML files that have the .spx extension are a reproducible cause of this crash.
Comment 4 Laura David Hurka 2019-02-06 23:49:31 UTC
I am now ready to confirm that the .spx file causes the crash. Meanwhile I got other .spx files, and since I did the following command, baloo does not crash anymore.

balooctl config add excludeFilters *.spx
Comment 5 Nate Graham 2019-02-08 20:35:44 UTC
*** Bug 404095 has been marked as a duplicate of this bug. ***
Comment 6 Nate Graham 2019-02-09 14:11:48 UTC
*** Bug 404095 has been marked as a duplicate of this bug. ***
Comment 7 Alexander Stippich 2019-02-09 14:48:21 UTC
Git commit 7415aa60d9f65c2eae10094fd3fff8327f6f11ce by Alexander Stippich.
Committed on 09/02/2019 at 14:48.
Pushed by astippich into branch 'master'.

Use content to determine mime type

Summary:
Determine the mime type for the
extractors based on the content, not on the file
extension. This avoids feeding files with a wrong
or the same file extension into the wrong extractor.

Reviewers: ngraham, bruns

Reviewed By: ngraham

Subscribers: kde-frameworks-devel, #baloo

Tags: #frameworks, #baloo

Differential Revision: https://phabricator.kde.org/D18819

M  +1    -1    src/file/extractor/app.cpp

https://commits.kde.org/baloo/7415aa60d9f65c2eae10094fd3fff8327f6f11ce
Comment 8 Nate Graham 2019-02-09 23:29:41 UTC
The above commit should make Baloo shop crashing because it removes these files from indexing consideration. The bug is still open because this doesn;t actually fix the crash itself, it just avoids it. But from a user perspective, there shouldn't be any more crashes on .spx files starting in KDE Frameworks 5.56.
Comment 9 Nate Graham 2019-02-10 19:30:09 UTC
*** Bug 404077 has been marked as a duplicate of this bug. ***
Comment 10 Laura David Hurka 2019-02-11 10:13:41 UTC
I have created sum[i=1; 4](26^i) = 475254 files with the same content as the problematic .spx file. The file indexer crashes only at .spx (although *.spx is in the excludeFilter)
.
$ balooctl index Test/*
[...]
[...].spx
Segmentation fault

This was not really useful, but now I can say that other files than .spx with less than five letters in the extension do not cause crashes.

Qt Version: 5.11.2
Frameworks Version: 5.54.0
Operating System: Linux 4.15.0-45-generic x86_64
Distribution: KDE neon User Edition 5.14
Comment 11 Nate Graham 2019-02-23 14:38:52 UTC
*** Bug 404420 has been marked as a duplicate of this bug. ***
Comment 12 Alexander Stippich 2019-03-10 15:02:50 UTC
Git commit 649555ee31820af01869c7bfe8c1e96e5a9abb37 by Alexander Stippich.
Committed on 10/03/2019 at 15:02.
Pushed by astippich into branch 'master'.

Rewrite the taglib extractor to use the generic PropertyMap interface

Summary:
Rewrite the taglib extractor to use taglib's
PropertyMap. Since this largely unifies the handling of the
different tag formats, but not quite, a lot of code is removed.
The resulting code is also faster. Additionally, this avoids the
usage of a FileRef object, which fixes a potential crash due to
a known bug in taglib.

Test Plan: all tests pass

Reviewers: ngraham, bruns, mgallien

Reviewed By: bruns

Subscribers: smithjd, kde-frameworks-devel, #baloo

Tags: #frameworks, #baloo

Differential Revision: https://phabricator.kde.org/D18826

M  +0    -2    autotests/taglibextractortest.cpp
M  +273  -880  src/extractors/taglibextractor.cpp
M  +6    -40   src/extractors/taglibextractor.h

https://commits.kde.org/kfilemetadata/649555ee31820af01869c7bfe8c1e96e5a9abb37
Comment 13 stream9 2019-03-17 07:01:52 UTC
It seems fix for this problem have caused regression.

With file extension, matroska containers (mkv, mka etc) are recognized as video/x-matroska, audio-xmatroska so on.

But with contents signature, they are recognized as super category application/x-matroska.

By determine mime type from content, matroska containers are all recognized as application/x-matroska, so BasicIndexingJob can't determine correct file type  (typesForMimeType() in file/basicindexingjob.cpp).

As consequence it doesn't index them at all.
Comment 14 Nate Graham 2019-03-17 08:59:42 UTC
Please report as a new bug. :)
Comment 15 Stefan Brüns 2019-03-19 00:03:09 UTC
Git commit 50a91ff610379c471cea7a8f2aa4d2ea42fa5494 by Stefan Brüns.
Committed on 19/03/2019 at 00:03.
Pushed by bruns into branch 'master'.

[ffmpegextractor] Add Matroska Video test case

Summary:
The test file was generated by converting the webm video file, using:
$> ffmpeg -i test.webm -acodec copy -vcodec copy test.mkv

Depends on D19845

Test Plan: ctest

Reviewers: #baloo, #frameworks, astippich, mgallien, ngraham

Reviewed By: #baloo, ngraham

Subscribers: kde-frameworks-devel

Tags: #frameworks, #baloo

Differential Revision: https://phabricator.kde.org/D19846

M  +5    -1    autotests/ffmpegextractortest.cpp
A  +-    --    autotests/samplefiles/test.mkv

https://commits.kde.org/kfilemetadata/50a91ff610379c471cea7a8f2aa4d2ea42fa5494
Comment 16 Stefan Brüns 2019-03-27 01:48:45 UTC
Git commit 69c25514cf6a08ceaaacbc4092cc02ff40853228 by Stefan Brüns.
Committed on 27/03/2019 at 01:48.
Pushed by bruns into branch 'master'.

Add helper function to determine mime type based on content and extension

Summary:
The QMimeDatabase::MatchDefault only falls back to content matching
if the extension is not known. This fails for e.g. Matroska files, where
the content allows to distinguish between audio and video files.

Reviewers: #baloo, #frameworks, astippich, ngraham, poboiko

Reviewed By: #baloo, astippich, ngraham

Subscribers: kde-frameworks-devel

Tags: #frameworks, #baloo

Differential Revision: https://phabricator.kde.org/D20045

M  +1    -0    src/CMakeLists.txt
A  +50   -0    src/mimeutils.cpp     [License: LGPL (v2.1+)]
A  +55   -0    src/mimeutils.h     [License: LGPL (v2.1+)]

https://commits.kde.org/kfilemetadata/69c25514cf6a08ceaaacbc4092cc02ff40853228
Comment 17 Alexander Stippich 2019-03-30 08:28:41 UTC
Git commit a256687a1d1150341b82cfa17218b12a944cda50 by Alexander Stippich.
Committed on 30/03/2019 at 08:28.
Pushed by astippich into branch 'master'.

Be more precise with mimetype detection

Summary:
Use the new mime type helper from KFileMetaData

Reviewers: #baloo, bruns

Reviewed By: #baloo, bruns

Subscribers: kde-frameworks-devel

Tags: #frameworks, #baloo

Differential Revision: https://phabricator.kde.org/D20011

M  +2    -1    src/file/extractor/app.cpp

https://commits.kde.org/baloo/a256687a1d1150341b82cfa17218b12a944cda50
Comment 18 Andreas Sturmlechner 2019-04-29 19:26:46 UTC
Unfortunately not fixed for me with KF 5.57.0.
Comment 19 Stefan Brüns 2019-04-29 19:39:49 UTC
(In reply to andreas.sturmlechner from comment #18)
> Unfortunately not fixed for me with KF 5.57.0.

*What* is not fixed for you?
Comment 20 Stefan Brüns 2019-04-29 19:40:29 UTC
x
Comment 21 Nate Graham 2019-04-29 20:41:54 UTC
If you're using 5.57 and still see crashes, please include a new backtrace and attach the guilty .spx file.
Comment 22 Andreas Sturmlechner 2019-04-29 22:50:51 UTC
Created attachment 119733 [details]
baloo_file_extractor-20190429-184351.kcrash.txt
Comment 23 Andreas Sturmlechner 2019-04-29 22:55:32 UTC
Crash happens with any of the .spx files from here: https://github.com/qgis/QGIS/tree/master/tests/testdata/test_gdb.gdb
Comment 24 Stefan Brüns 2019-04-30 02:22:57 UTC
Please file a new bug report - this is a binary file, not an XML file.
Comment 25 Nate Graham 2019-04-30 04:17:39 UTC
Interesting, a new way for .spx files to crash Baloo. :p
Comment 26 Stefan Brüns 2019-04-30 17:24:17 UTC
Git commit 61b1916c3e87c3b8f4fc3d1f1d19bf427b9247da by Stefan Brüns.
Committed on 30/04/2019 at 17:24.
Pushed by bruns into branch 'master'.

[TagLibExtractor] Fix crash on invalid Speex files

Summary:
TagLib::Ogg::Speex::File::isValid() returns true even for invalid files,
but tag() only returns a valid XiphComment when the file is valid.

Other TagLib::Ogg::* classes properly clear the valid flag when
encountering files.

See https://github.com/taglib/taglib/issues/902

Reviewers: #baloo, #frameworks, ngraham, astippich

Reviewed By: #baloo, ngraham, astippich

Subscribers: kde-frameworks-devel

Tags: #frameworks, #baloo

Differential Revision: https://phabricator.kde.org/D20913

M  +3    -1    src/extractors/taglibextractor.cpp

https://commits.kde.org/kfilemetadata/61b1916c3e87c3b8f4fc3d1f1d19bf427b9247da