Bug 447681 - System ran out of ram while baloo_file_extractor is indexing a huge mbox file
Summary: System ran out of ram while baloo_file_extractor is indexing a huge mbox file
Status: RESOLVED DUPLICATE of bug 443547
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: Baloo File Daemon (other bugs)
Version First Reported In: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-12-30 08:27 UTC by Alex Fiestas
Modified: 2025-04-19 18:48 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Fiestas 2021-12-30 08:27:38 UTC
SUMMARY
I have a mbox file containing 10GB worth of email, baloo_file_extractor seems to be loading the entire file into memory which fails because well... I don't have that much memory.

I tried to debug this using strace and I could see the process was basically copying the entire file into memory, but I could be wrong.

STEPS TO REPRODUCE
1.  Get an mbox file that is large
2.  Put it in a folder indexed by Baloo
3.  Observe baloo_file_extractor increase in memory usage

OBSERVED RESULT
baloo_file_extractor requires huge amount of ram to parse mbox files that are large.

EXPECTED RESULT
The file should be parsed in such a way that no huge amount of memory is needed.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma:  from Git
Qt Version:  5.15 + KDE Patches
Comment 1 tagwerk19 2021-12-30 12:38:21 UTC
Feels like 443547, see:
    https://bugs.kde.org/show_bug.cgi?id=443547#c1
I don't think there's a trick to get the individual messages indexed separately.
Comment 2 tagwerk19 2022-01-14 08:42:39 UTC
(In reply to tagwerk19 from comment #1)
> Feels like 443547
I'll flag as a duplicate.

I know baloo_file_extractor skips text/html files over 10MB:
    https://bugs.kde.org/show_bug.cgi?id=410680#c7
There's an argument that it should skip any file over 10MB (or, maybe friendlier, index the first 10MB and then stop)

*** This bug has been marked as a duplicate of bug 443547 ***
Comment 3 Bug Janitor Service 2025-04-19 18:48:09 UTC
A possibly relevant merge request was started @ https://invent.kde.org/frameworks/baloo/-/merge_requests/231