SUMMARY I have a mbox file containing 10GB worth of email, baloo_file_extractor seems to be loading the entire file into memory which fails because well... I don't have that much memory. I tried to debug this using strace and I could see the process was basically copying the entire file into memory, but I could be wrong. STEPS TO REPRODUCE 1. Get an mbox file that is large 2. Put it in a folder indexed by Baloo 3. Observe baloo_file_extractor increase in memory usage OBSERVED RESULT baloo_file_extractor requires huge amount of ram to parse mbox files that are large. EXPECTED RESULT The file should be parsed in such a way that no huge amount of memory is needed. SOFTWARE/OS VERSIONS Linux/KDE Plasma: from Git Qt Version: 5.15 + KDE Patches
Feels like 443547, see: https://bugs.kde.org/show_bug.cgi?id=443547#c1 I don't think there's a trick to get the individual messages indexed separately.
(In reply to tagwerk19 from comment #1) > Feels like 443547 I'll flag as a duplicate. I know baloo_file_extractor skips text/html files over 10MB: https://bugs.kde.org/show_bug.cgi?id=410680#c7 There's an argument that it should skip any file over 10MB (or, maybe friendlier, index the first 10MB and then stop) *** This bug has been marked as a duplicate of bug 443547 ***
A possibly relevant merge request was started @ https://invent.kde.org/frameworks/baloo/-/merge_requests/231