Bug 375131 - baloo hangs up forever when it reachs a corrupted file.
Summary: baloo hangs up forever when it reachs a corrupted file.
Status: RESOLVED FIXED
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: Baloo File Daemon (show other bugs)
Version: 5.29.0
Platform: Arch Linux Linux
: NOR major
Target Milestone: ---
Assignee: Pinak Ahuja
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-16 09:18 UTC by Aqa-Ib
Modified: 2018-12-14 02:33 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
A file that causes baloo to hang forever (125.73 KB, image/jpeg)
2017-08-20 17:37 UTC, Kieran Ramos
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Aqa-Ib 2017-01-16 09:18:07 UTC
When baloo_file_extractor reach a corrupted file, it will try to extract it contents forever, hanging up forever on that file, failing to index and extract anymore files, and consuming a lot of CPU resources.

It will be desirable for baloo to have a timeout. Seeing that it can't index the contents of a file after X attempts, skip that file and go for the next one.
Comment 1 kafei 2017-01-20 23:13:03 UTC
I believe I have also been hit by this bug. Today baloo_file_extractor suddenly started using 100% CPU and didn't stop.

Running `balooctl status` gave the following output:

    Baloo File Indexer is running
    Indexer state: Idle
    Indexed 11547 / 11548 files
    Current size of index is 947.69 MiB

Which looks to me like it's hung on a particular file, but I'm not sure how to figure out which file that is. The contents of /proc/baloo_file_extractor_pid/fd gave the following, which didn't seem to indicate the offending file:

    0 -> pipe:[17073]
    1 -> pipe:[17074]
    10 -> anon_inode:inotify
    11 -> /home/matt/.local/share/baloo/index
    12 -> /home/matt/.local/share/baloo/index
    13 -> /proc/1053/mounts
    14 -> socket:[17086]
    15 -> socket:[18879]
    16 -> /home/matt/.local/share/baloo/index-lock
    17 -> /home/matt/.local/share/baloo/index
    18 -> /home/matt/.local/share/baloo/index
    2 -> pipe:[17075]
    3 -> socket:[17080]
    4 -> anon_inode:[eventfd]
    5 -> anon_inode:[eventfd]
    6 -> socket:[17082]
    7 -> socket:[17894]
    8 -> anon_inode:[eventfd]
    9 -> socket:[18877]

Even after restarting, baloo_file_extractor immediately starts and hangs in the same way. The only way to stop it is to kill the process.
Comment 2 Kieran Ramos 2017-03-13 19:16:07 UTC
I have also experienced this behavior where baloo hangs forever on the same file. The index remains at same file and doesn't finish indexing. The file which baloo is stuck on can be found in kinfocenter under File Index Monitor. Unlike Aqa-Ib though when baloo is stuck it didn't consume any CPU resources. Baloo would get stuck on the same file even after deleting the index by issuing `baloo disable` and `baloo enable`.

As a workaround I have had to delete the offending file and stop and restart baloo usually with `balooctl stop` followed by `killall baloo_file`, `killall baloo_file_extractor` and `balooctl start`.

I have had baloo get stuck on a PDF and on two different JPGs.
Comment 3 skierpage 2017-05-26 21:12:12 UTC
(In reply to Kieran Ramos from comment #2)
> I have also experienced this behavior where baloo hangs forever on the same
> file. The index remains at same file and doesn't finish indexing. The file
> which baloo is stuck on can be found in kinfocenter under File Index
> Monitor.

My baloo_file_extractor is hung, consuming 100% of a core, and I don't see any file in KInfocenter > File Index Monitor, mine says "Indexer State: Idle". Like kafei, I can't figure out what file if any baloo_file_extractor is attempting to index.
 
> Unlike Aqa-Ib though when baloo is stuck it didn't consume any CPU
> resources. Baloo would get stuck on the same file even after deleting the
> index by issuing `baloo disable` and `baloo enable`.

Perhaps kafei and I have a different bug. In my case baloo_file_extractor uses 250+GB (.25+t erabyte) of virtual memory according to top and htop. Please check on the size of baloo_file_extractor
Comment 4 Kieran Ramos 2017-08-20 17:37:51 UTC
Created attachment 107413 [details]
A file that causes baloo to hang forever

I've attached a file that causes baloo to hang forever. This bug is still present in baloo version 5.36.0.
Comment 5 Stefan Brüns 2018-10-14 21:32:12 UTC
@skierpage - your problem is probably already fixed for some time, i.e. since Frameworks 5.46, see https://phabricator.kde.org/D12335

@Kieran - thanks for providing the problematic file, a bugfix is on its way.
Comment 6 Igor Poboiko 2018-10-16 13:56:15 UTC
Git commit 5eee9ac75b7d6bb19795c2d3b964fe05fd8fc47c by Igor Poboiko.
Committed on 16/10/2018 at 13:56.
Pushed by poboiko into branch 'master'.

Don't crash on invalid exiv2 data

Summary:
The file from bug 375131 crashes `baloo_file_extractor`.
The problem is that its EXIF data contains a key `Exif.Photo.FocalLength`,
whose type is `Exiv2::unsignedRational`, and whose value is empty.
On the other hand, the `Exiv2::Value::toFloat()` call relies on at least single component of a value,
causing undefined behavior (i.e. crash) if there is none.

This is simple workaround: if we got a property with no value, just return an empty QVariant().
(unfortunately, didn't manage to reproduce the hang reported in the bug originally)
Related: bug 352856, bug 353848, bug 361259

Test Plan: `baloo_file_extractor` no longer crashes on the file, it processes the file and extracts all the necessary data

Reviewers: #baloo, #frameworks, astippich

Reviewed By: astippich

Subscribers: bruns, astippich, kde-frameworks-devel

Tags: #frameworks, #baloo

Differential Revision: https://phabricator.kde.org/D16165

M  +3    -0    src/extractors/exiv2extractor.cpp

https://commits.kde.org/kfilemetadata/5eee9ac75b7d6bb19795c2d3b964fe05fd8fc47c
Comment 7 Stefan Brüns 2018-10-25 14:18:17 UTC
Git commit 1509ca51c5ed5b78d56a794e466eb4b9d0bd3f3b by Stefan Brüns.
Committed on 25/10/2018 at 14:17.
Pushed by bruns into branch 'master'.

[Extractor] Make extractor crash resilient

Summary:
Connect to QProcess::finished to detect the exit status. In case the
process has crashed, signal the indexer.

On a crash, restart the process and feed it a smaller batch. If the
crashing batch contains only a single file, mark the file as failed, i.e.
add it to the "failedid" db and remove it from the content indexing db
to avoid further indexing attempts.

Test Plan:
start `balooctl monitor`
add a file known to crash the extractor to an indexable path
touch an unproblematic file
-> indexer crashes on first file and continues with the second

Reviewers: #baloo, #frameworks, poboiko, ngraham

Reviewed By: #baloo, ngraham

Subscribers: broulik, apol, kde-frameworks-devel

Tags: #frameworks, #baloo

Differential Revision: https://phabricator.kde.org/D16266

M  +17   -3    src/file/extractorprocess.cpp
M  +2    -0    src/file/extractorprocess.h
M  +17   -1    src/file/filecontentindexer.cpp
M  +10   -0    src/file/filecontentindexerprovider.cpp
M  +1    -0    src/file/filecontentindexerprovider.h

https://commits.kde.org/baloo/1509ca51c5ed5b78d56a794e466eb4b9d0bd3f3b