Bug 392301 - baloo_file_extractor logspam, invalid encoding
Summary: baloo_file_extractor logspam, invalid encoding
Status: RESOLVED NOT A BUG
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: Baloo File Daemon (other bugs)
Version First Reported In: 5.44.0
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-25 10:49 UTC by Lukas Ba.
Modified: 2019-01-07 20:21 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments
histfile_reduced (1.05 KB, text/plain)
2018-04-26 23:04 UTC, Lukas Ba.
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lukas Ba. 2018-03-25 10:49:45 UTC
The following message is repeated every second in the logs.

>Mär 25 12:40:56 linux baloo_file_extractor[2094]: Invalid encoding. Ignoring "/home/user/.histfile"
>Mär 25 12:40:56 linux kdeinit5[1063]: ()
>Mär 25 12:40:56 linux kdeinit5[1063]: ("/home/user/.histfile")
>Mär 25 12:40:56 linux kdeinit5[1063]: ()

The file in question is the zsh history file, and can be read by text editor.
Comment 1 Stefan Brüns 2018-04-19 21:16:36 UTC
Happens if the file encoding is invalid according to your current locale. Please try:

$> file ~/.histfile

Should tell you if there are any invalid characters.

You can also try
$> sed -e 's/[-+^"!%#&$\\@_=:;.,/<>?* (){}a-zA-Z0-9]\?\[\?\]\?//g '  < ~/.histfile > ~/.histfile_reduced

and then run check with e.g. hexdump -C or okteta what characters are still left.
Comment 2 Lukas Ba. 2018-04-26 22:06:04 UTC
The problem is not that a file has an invalid encoding, but that baloo_file_extractor complains about it. For some files, invalid encoding is a normal thing .
Comment 3 Christoph Feck 2018-04-26 22:31:06 UTC
Then fix these filenames using convmv. If they are invalid, the extractor has a reason to complain.
Comment 4 Lukas Ba. 2018-04-26 22:33:15 UTC
(In reply to Christoph Feck from comment #3)
> Then fix these filenames using convmv. If they are invalid, the extractor
> has a reason to complain.

No. I'm sure .histfile is an O.K. filename.
Comment 5 Christoph Feck 2018-04-26 22:33:24 UTC
Reading more carefully, the issue is not the filenames, but the file contents, so please ignore comment #3.
Comment 6 Stefan Brüns 2018-04-26 22:53:11 UTC
You are contradicting yourself - "and can be read by text editor" - so it has some text encoding.
I wanted to find out which codepoints are in there which are *not* valid, but unfortunately you are not providing this info.
Comment 7 Lukas Ba. 2018-04-26 23:04:19 UTC
Created attachment 112277 [details]
histfile_reduced

Here i am providing the result histfile_reduced.

file ~/.histfile
/home/user/.histfile: Non-ISO extended-ASCII text
Comment 8 Stefan Brüns 2018-04-26 23:21:49 UTC
(In reply to Lukas Ba. from comment #7)
> Created attachment 112277 [details]
> histfile_reduced
> 
> Here i am providing the result histfile_reduced.
> 
> file ~/.histfile
> /home/user/.histfile: Non-ISO extended-ASCII text

It contains invalid codepoints near the end.
Comment 9 Lukas Ba. 2018-04-27 08:34:13 UTC
(In reply to Stefan Brüns from comment #8)
> (In reply to Lukas Ba. from comment #7)
> > Created attachment 112277 [details]
> > histfile_reduced
> > 
> > Here i am providing the result histfile_reduced.
> > 
> > file ~/.histfile
> > /home/user/.histfile: Non-ISO extended-ASCII text
> 
> It contains invalid codepoints near the end.

Looks like zsh likes to add those codepoints to its history file. I also get that result with the default zsh config on kubuntu, so anyone installing zsh on kubuntu will see this.
Comment 10 Stefan Brüns 2019-01-07 20:21:00 UTC
zsh has no default history location, so you (or your distribution) has set HISTFILE explicitly. You also have enabled hidden file/directory indexing.

There is not way baloo can guess this.

If you want it excluded, either disable hidden file indexing, or exclude the file manually:
$> balooctl config add excludeFilters .histfile