Bug 386791 - baloo_file_extractor memory leak when indexing GPG encrypted files
Summary: baloo_file_extractor memory leak when indexing GPG encrypted files
Status: RESOLVED WAITINGFORINFO
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: Baloo File Daemon (other bugs)
Version First Reported In: 5.39.0
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: Pinak Ahuja
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-12 20:18 UTC by Victor Bouvier-Deleau
Modified: 2019-02-08 14:21 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments
Indexing of GPG files (357.57 KB, image/png)
2017-11-12 20:18 UTC, Victor Bouvier-Deleau
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Victor Bouvier-Deleau 2017-11-12 20:18:47 UTC
Created attachment 108810 [details]
Indexing of GPG files

Software:
Arch Linux (fully up to date on 2017-12-11 21:12)
KDE 5.39.0 / Plasma 5.11.3
baloo 5.39.0

Hardware:
i5 3570k
MSI Z77A-G45
8GB DDR3-1600MHz
GTX 1080Ti

Overview:

Ok so, on the left is the output from "balooctl monitor" and as you can see it just started indexing some GPG encrypted files (android apk backed up with oandbackup and encrypted using OpenKeychain).
Before that the process was using very little memory, the second output from "balooctl status" in the top right was made just before indexing of the GPG encrypted files.

But if you look at the memory amount the process "baloo_file_extractor" takes in the bottom right you can see that there's a problem. I reruned "balooctl status" (not shown in screenshot) after and it only showed "Current size of index is 292,50 MiB" even though KSysGuard is showing 3.8GiB.

If I let baloo go on and index the rest (I tried it a couple of times), the process "baloo_file_extractor" is eventualy going to take the totality of my RAM, then my PC will freeze while swaping 6 or so GiB, then baloo is going to continue to eat the new 6 or so GiB that was just liberated.
Comment 1 Victor Bouvier-Deleau 2017-11-12 20:32:27 UTC
I don't see a way to edit comments and I accidentaly pressed enter while editing the previous one...  

I'm able to reproduce the leak every time. If I restart my computer and don't stop the baloo daemon with "balooctl stop" manually after rebooting my PC will end up freezing like I described in my previous comment.

The ecnrypted files in question are a full backup of all my android applications along with their data, encrypted with my GPG pubkey.
oandbackup is the android application used to backup and OpenKeychain is used to encrypt the files. The result is as follow:
A folder named after the application, for example "com.spotify.music" containing:
- The apk encrypted with GPG
- The application data, compressed and then encrypted with GPG
- A log containing information for oandbackup

I could provide a compressed archive of all those files if you want but it's rather big (1.4GiB) so I would need to upload it somewhere first.
Comment 2 Nate Graham 2017-11-12 20:34:57 UTC
Go ahead and just add another comment. I don't think we need your 
encrypted file. :)

Nate

On 11/12/2017 01:32 PM, Victor Bouvier-Deleau wrote:
> https://bugs.kde.org/show_bug.cgi?id=386791
> 
> --- Comment #1 from Victor Bouvier-Deleau <victor.bouvier-deleau+kdebugtracker@kolabnow.com> ---
> I don't see a way to edit comments and I accidentaly pressed enter while
> editing the previous one...
> 
> I'm able to reproduce the leak every time. If I restart my computer and don't
> stop the baloo daemon with "balooctl stop" manually after rebooting my PC will
> end up freezing like I described in my previous comment.
> 
> The ecnrypted files in question are a full backup of all my android
> applications along with their data, encrypted with my GPG pubkey.
> oandbackup is the android application used to backup and OpenKeychain is used
> to encrypt the files. The result is as follow:
> A folder named after the application, for example "com.spotify.music"
> containing:
> - The apk encrypted with GPG
> - The application data, compressed and then encrypted with GPG
> - A log containing information for oandbackup
> 
> I could provide a compressed archive of all those files if you want but it's
> rather big (1.4GiB) so I would need to upload it somewhere first.
>
Comment 3 Victor Bouvier-Deleau 2017-11-12 20:49:36 UTC
Ok so the steps to reproduce the bug without having access to my files would be:
1. Create a lot of GPG encrypted files that would amount for 1GiB or so
2. Place them somewhere in the /home/<user> directory
3. Restart the computer
4. Run thop/open ksysguard to monitor memory usage on baloo_file_extractor process
5. Run "balooctl monitor" to monitor the indexing process in order to know when it will begin indexing those encrypted files
6. Watch the memory consumption go through the roof as soon as baloo start indexing the encrypted files while balooctl status doesn't report that amount of memory used by the index

If you want me to do more diagnostic on my end, ask away!
Comment 4 Stefan Brüns 2018-10-15 18:40:10 UTC
(In reply to Victor Bouvier-Deleau from comment #3)
> Ok so the steps to reproduce the bug without having access to my files would
> be:
> 1. Create a lot of GPG encrypted files that would amount for 1GiB or so
> 2. Place them somewhere in the /home/<user> directory
> 3. Restart the computer
> 4. Run thop/open ksysguard to monitor memory usage on baloo_file_extractor
> process
> 5. Run "balooctl monitor" to monitor the indexing process in order to know
> when it will begin indexing those encrypted files
> 6. Watch the memory consumption go through the roof as soon as baloo start
> indexing the encrypted files while balooctl status doesn't report that
> amount of memory used by the index
> 
> If you want me to do more diagnostic on my end, ask away!

The file is equivalent to e.g. the output of the following?

$> dd if=/dev/zero bs=1k count=1000 | gpg2 -a -e -o foo.gpg -r <myself>

Although the file is "text", as it is plain ASCII, baloo obviously should skip it when doing the content-indexing step.
Comment 5 Victor Bouvier-Deleau 2018-10-29 08:45:02 UTC
(In reply to Stefan Brüns from comment #4)
> The file is equivalent to e.g. the output of the following?
> 
> $> dd if=/dev/zero bs=1k count=1000 | gpg2 -a -e -o foo.gpg -r <myself>
> 
> Although the file is "text", as it is plain ASCII, baloo obviously should
> skip it when doing the content-indexing step.

A lot of the files are bigger than that, in my case the .apk themselves are encrypted using GPG so the file varies from 1MiB to 80MiB. Although bear in mind that I'm not using KDE anymore so I don't know if this problem is still happening and I also can't test it.
Comment 6 Stefan Brüns 2018-11-01 19:33:00 UTC
Reporter not able to provide required info.
Comment 7 Stefan Brüns 2019-02-08 14:21:31 UTC
Git commit 3aa911d4a0ac88a0a8adf3076f579a8ba4f73ed5 by Stefan Brüns.
Committed on 08/02/2019 at 14:21.
Pushed by bruns into branch 'master'.

[Extractor] Exclude GPG encrypted data from being indexed

Summary:
application/pgp-encrypted may be encoded as base64 and thus inherits from
text/plain, but contains no extractable plaintext at all.

Reviewers: #baloo, #frameworks, ngraham, poboiko

Reviewed By: #baloo, ngraham

Subscribers: kde-frameworks-devel

Tags: #frameworks, #baloo

Differential Revision: https://phabricator.kde.org/D18851

M  +3    -0    src/file/fileexcludefilters.cpp

https://commits.kde.org/baloo/3aa911d4a0ac88a0a8adf3076f579a8ba4f73ed5