Bug 409257

Summary: Baloo index gets corrupted, no results
Product: [Frameworks and Libraries] frameworks-baloo Reporter: S <sb56637>
Component: generalAssignee: baloo-bugs-null
Status: RESOLVED FIXED    
Severity: major CC: gersonfjunior, igor.poboiko
Priority: NOR    
Version: 5.59.0   
Target Milestone: ---   
Platform: openSUSE   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:
Attachments: Krunner not returning any results (normally it should for this keyword)
App Menu search not returning any results (normally it should for this keyword)
My current ~/.local/share/baloo/ directory while no results are returned
baloosearch returns blank lines that match the number of expected results
correct baloosearch results for comparison

Description S 2019-06-27 16:43:27 UTC
Hi, I'm running an up-to-date installation of openSUSE Tumbleweed with the latest Plasma / KDE / KF5 packages.

Several times per week something is corrupting my Baloo index. It works well for a few days, then eventually starts returning no results as shown in my screenshots. To fix it, I run `balooctl stop`, delete `~/.local/share/baloo/`, and then `balooctl start`. But after a few more days it always breaks again. 

This has been going on for several months, but when I switched to Plasma about a year ago this wasn't a problem at first.

Thanks a lot!
Comment 1 S 2019-06-27 16:44:26 UTC
Created attachment 121189 [details]
Krunner not returning any results (normally it should for this keyword)
Comment 2 S 2019-06-27 16:45:07 UTC
Created attachment 121190 [details]
App Menu search not returning any results (normally it should for this keyword)
Comment 3 S 2019-06-27 16:45:48 UTC
Created attachment 121191 [details]
My current ~/.local/share/baloo/ directory while no results are returned
Comment 4 Gerson 2019-06-29 17:28:48 UTC
Exactly the same thing happens here for me on a recent Manjaro installation. Quite frustrating. For now I'm using AngrySearch as a workaround, but I prefer the baloo integration with KDE...

After I run 'balooctl disable' and 'balooctl enable', it rewrites the db and works again for a few days. But if I run disable + enable and then restart the computer... the db gets corrupted immediately every time.

Is there any error logs that I can check? Maybe I have a corrupted file somewhere being indexed and it is breaking baloo somehow? How can I check this?
Comment 5 Gerson 2019-06-29 17:34:19 UTC
Created attachment 121221 [details]
baloosearch returns blank lines that match the number of expected results

Somehow the search kind of works, as every baloosearch I run in the terminal returns the correct number of entries, but they show up as blank lines.
Comment 6 Gerson 2019-06-29 17:35:00 UTC
Created attachment 121222 [details]
correct baloosearch results for comparison

How it looks when the db is working properly.
Comment 7 Igor Poboiko 2019-06-30 12:03:49 UTC
Instead rebuilding the whole index, try performing `balooctl clear ~/somefile && balooctl index ~/somefile` (on some existing file inside your home folder)
Comment 8 S 2019-06-30 13:32:51 UTC
(In reply to Igor Poboiko from comment #7)
> Instead rebuilding the whole index, try performing `balooctl clear
> ~/somefile && balooctl index ~/somefile` (on some existing file inside your
> home folder)

Hi Igor, thanks, would I run that command on a file or a directory?
Comment 9 S 2019-06-30 14:13:56 UTC
> Hi Igor, thanks, would I run that command on a file or a directory?

I would assume directory. :-) Here's what happened:

==================
> balooctl clear ~/Downloads && balooctl index ~/Downloads
> Clearing /home/myself/Downloads
> Database has corrupted entries baloo may misbehave, please recreate the DB by running $ balooctl disable && balooctl enable
> Database has corrupted entries baloo may misbehave, please recreate the DB by running $ balooctl disable && balooctl enable
> Database has corrupted entries baloo may misbehave, please recreate the DB by running $ balooctl disable && balooctl enable
> Database has corrupted entries baloo may misbehave, please recreate the DB by running $ balooctl disable && balooctl enable
> Database has corrupted entries baloo may misbehave, please recreate the DB by running $ balooctl disable && balooctl enable
> File(s) cleared
> Indexing /home/myself/Downloads
> File(s) indexed
==================


Then I tried the suggestion from above:

> balooctl disable && balooctl enable
> Disabling the File Indexer
> Enabling the File Indexer


Then I get a large number of errors like this from a directory I have full of .rtf files:

> Invalid encoding. Ignoring "/home/myself/Documents/path/something1.rtf"


Later came this:

> QIODevice::read (QProcess): device not open
> [h264 @ 0x55bce695f500] A non-intra slice in an IDR NAL unit.
> [h264 @ 0x55bce695f500] decode_slice_header error                                                                                                                              
> QIODevice::read (QProcess): device not open
> QIODevice::read (QProcess): device not open
> [mjpeg @ 0x55bce83d7980] EOI missing, emulating
> QIODevice::read (QProcess): device not open

And the "QIODevice::read (QProcess): device not open" error was repeated dozens of times.


Also this happened:

> QIODevice::read (QProcess): device not open
> "Error: xref num 39 not found but needed, try to reconstruct<0a>"
> QIODevice::read (QProcess): device not open
> "Error: xref num 39 not found but needed, try to reconstruct<0a>"


Later came this:

> QIODevice::read (QProcess): device not open
> [mjpeg @ 0x55bce5a3d640] EOI missing, emulating
> [mjpeg @ 0x55bce5a3d640] EOI missing, emulating                                                                                                                                
> [mjpeg @ 0x55bce5a3d640] EOI missing, emulating                                                                                                                                
> QIODevice::read (QProcess): device not open
> QIODevice::read (QProcess): device not open
> QIODevice::read (QProcess): device not open
> QIODevice::read (QProcess): device not open
> "Error (27479): Invalid blend mode in ExtGState"


Also:

> [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55bce69bb2c0] Invalid SampleDelta -160 in STTS, at 1 st:0
> [mov,mp4,m4a,3gp,3g2,mj2 @ 0x55bce69bb2c0] Invalid SampleDelta -160 in STTS, at 1 st:0


Also:

> "Error: Missing language pack for 'Adobe-Japan1' mapping"
> "Error: Unknown font tag 'C0_0'"
> "Error (53847): No font in show"


Also:

> "Error: Unknown character collection 'PDFAUTOCAD-Indentity0'"
> "Error: Unknown character collection 'PDFAUTOCAD-Indentity0'"
> "Error: Unknown character collection 'PDFAUTOCAD-Indentity0'"
> "Error: Unknown character collection 'PDFAUTOCAD-Indentity0'"
> "Error: Unknown character collection 'PDFAUTOCAD-Indentity0'"
> "Error: Unknown character collection 'PDFAUTOCAD-Indentity0'"


Later:

> QIODevice::read (QProcess): device not open
> Error: XMP Toolkit error 203: Duplicate property or field node
> Warning: Failed to decode XMP metadata.
> Error: XMP Toolkit error 203: Duplicate property or field node
> Warning: Failed to decode XMP metadata.
> [mjpeg @ 0x55bce9918d00] EOI missing, emulating


Another one, repeated about 8 times:

> "Error: FoFiType1::parse a line has more than 255 characters, we don't support this"


Also this:

> Invalid document structure (docProps is missing)
> QIODevice::read (QProcess): device not open
> QIODevice::read (QProcess): device not open
> QIODevice::read (QProcess): device not open
> QIODevice::read (QProcess): device not open
> Error: Directory Olympus with 7168 entries considered invalid; not read.
> [mjpeg @ 0x55bce959f800] EOI missing, emulating
> [mjpeg @ 0x55bce959f800] EOI missing, emulating                                                                                                                                
> [avi @ 0x55bce7c94f80] non-interleaved AVI
> Invalid encoding. Ignoring "/home/myself/Documents/path/something1.txt"
> "Error: Expected the default config, but wasn't able to find it, or it isn't a Dictionary"
Comment 10 Gerson 2019-06-30 17:58:37 UTC
Where do you see the error messages? Directly on the terminal as outputs of the 'balooctl enable' command?

After I run 'balooctl enable', I get no error messages on the terminal, and the search starts working properly. It's only after a while, or after I reboot that the baloosearch breaks.

Is there a log file that I am missing?
Comment 11 S 2019-06-30 18:21:06 UTC
(In reply to Gerson from comment #10)
> Where do you see the error messages? Directly on the terminal as outputs of
> the 'balooctl enable' command?

Yes.
 
> After I run 'balooctl enable', I get no error messages on the terminal, and
> the search starts working properly. It's only after a while, or after I
> reboot that the baloosearch breaks.

This the main issue, something is corrupting the index after a certain amount of time.
Comment 12 Gerson 2019-06-30 18:50:40 UTC
Ok. Thanks. So, in my case this must take a long time to occur, since I've never seen any error messages. I'll try running disable+enable again and leave the terminal open for a long time to check it.

But these error messages are really not written to any log files? Seems like a drastic failure of the implementation.
Comment 13 Igor Poboiko 2019-07-01 08:55:50 UTC
(In reply to S from comment #9)
> I would assume directory. :-) Here's what happened:
> 
> [...]

It doesn't really matter, file or directory. 
Your problem looks like corruption of IdTreeDB, when for some reason Baloo "forgets" about some of the top-level directories, and then just cannot resolve path of particular documents. Reindexing of a single file might restore this information.
(and it will also help us to know whether it is the problem)

> 
> Then I tried the suggestion from above:
> 
> > balooctl disable && balooctl enable
> > Disabling the File Indexer
> > Enabling the File Indexer

These commands will totally wipe your database and then rebuild it from scratch. 
Although it will most likely help, it's also a pretty time consuming operation. 

>
> Then I get a large number of errors like this from a directory I have full
> of .rtf files:
>[...]

Those are non-critical errors, it's more like debug-messages, reporting different problems when extracting metadata from various files on your computer. 
Logging them would not be much useful. I don't even know why they are written to the terminal (it might depend on some of your distro default settings, probably?)
Comment 14 S 2019-07-01 23:59:57 UTC
Thanks again Igor for the reply.

(In reply to Igor Poboiko from comment #13)
> Your problem looks like corruption of IdTreeDB, when for some reason Baloo
> "forgets" about some of the top-level directories, and then just cannot
> resolve path of particular documents. Reindexing of a single file might
> restore this information.
> (and it will also help us to know whether it is the problem)

The real issue with this bug report is that it keeps getting corrupted. I have re-indexed it many times, and it does indeed fix it for short time. But after a couple of days it gets corrupted again.
Comment 15 Gerson 2019-07-02 15:06:42 UTC
Exactly. I couldn't find any corrupted files and the workaround of re-indexing a single file is not sufficient. I've left my computer running for a few days with terminal open after running balooctl disable + enable. There was no error messages. Baloosearch on terminal was working fine, and on Dolphi it was ok as well. But the destkop search on application menu still fails. After rebooting the system... it happens again. The database breaks somehow and baloosearch on terminal shows blank lines.

So, here are a new test to try to find something:

1) baloosearch is currently broken. It returns blank lines, while the number of lines actually match the number of expected search results.

2) I'll make a copy of the index file on ~/.local/share/baloo: cp index index.bak

3) re-indexing a folder: balooctl clear ~/Downloads && balooctl index ~/Downloads/. Note: the file I'm searching for is not on this folder.

4) baloosearch still broken.


5) Reindexing the Dropbox folder: 'balooctl clear ~/Dropbox' gives errors:

Database has corrupted entries baloo may misbehave, please recreate the DB by running $ balooctl disable && balooctl enable

6) balooctl index ~/Dropbox/ : no new error messages

7) baloosearch still broken. But 'balooctl clear ~/Dropbox' does not show errors anymore.

8) 'balooctl clear ~/inSync/' return errors:

Database has corrupted entries baloo may misbehave, please recreate the DB by running $ balooctl disable && balooctl enable

9) 'balooctl index ~/inSync/' : no new error messages

10) baloosearch still broken, but 'balooctl clear ~/inSync' does not show errors anymore.

11) I've try to clear/index other folders as well (~/tmp, ~/Documents, ~/Videos, ...). Most clear commands return the database corrupted error. 

Simply try to clear and reindex does not work. After all that, the md5sum and file sizes of the original index.bak and "new one" after all the clear and index commands are

md5                               file        size 
7205c24bc98368025847ddc2322637f8  index       44376064
602565b117986424cb4c0b39036889f2  index.bak   44376064

Even after I clear all folders, the file size won't change. The command 'balooctl indexSize' shows that the used space is much smaller. But it seems that it remains something broken within the index file.

I can only recover baloosearch after running disable + enable, actually deletes the index file in the process.

Another curious thing. I'm indexing only the file names, and not the content. Nonetheless, the process feels a bit too fast. Disabling and re-enabling baloo, the re-indexing process is almost instantaneous. Is it actually running the file name indexing from scratch? Or using some backup of the index file stored in a cache somewhere?

BTW, my baloo version is 5.59.0-1 on Manjaro stable.
Comment 16 Gerson 2019-07-03 23:21:24 UTC
Today Manjaro had an update and it seems that the bug is fixed. However, baloo itself was not updated. It seems that the bug was somewhere else, on some baloo dependence...
Comment 17 Gerson 2019-07-03 23:49:09 UTC
sorry... my mistake. It took a while longer than usual, but the db is corrupted again. The problem persists.

Manjaro has a baloo-git package (version 5.60)... I might try it next weekend. But I cannot try it now on my working computer.
Comment 18 Igor Poboiko 2019-07-05 09:14:43 UTC
We would really appreciate if someone who experiences this could send us a corrupted "~/.local/share/baloo/index" entry.

Just be aware that it will most likely contain a lot of personal information.
If you don't want it to be exposed, make sure to add folders which you don't want to be indexed to "Do not search in these locations", inside "SystemSettings -> Search -> File Search".
Alternatively, you can create a new user (with no personal information at all), and try to reproduce it.

Thanks!

> Even after I clear all folders, the file size won't change. 
> The command 'balooctl indexSize' shows that the used space is much smaller. 
> But it seems that it remains something broken within the index file.
It's one of the peculiarities of the underlying database, LMDB. It uses Copy-on-Write, and database actually doesn't shrink. But it can reuse memory pages that became free later on (so after clear&reindex, db size might actually not change at all).


> Another curious thing. I'm indexing only the file names, and not the content. 
> Nonetheless, the process feels a bit too fast. Disabling and re-enabling baloo, the re-indexing process is almost instantaneous. 
> Is it actually running the file name indexing from scratch? 
> Or using some backup of the index file stored in a cache somewhere?

Filename indexing should be quite fast (don't know how fast is "too fast" though - but I think it takes less than a second in my case, which is ~13.5k documents inside index). 
Baloo don't have any backups / caches for it, it only can redo it from scratch.
Comment 19 S 2019-07-05 17:08:56 UTC
(In reply to Igor Poboiko from comment #18)
> We would really appreciate if someone who experiences this could send us a
> corrupted "~/.local/share/baloo/index" entry.
> 
> Just be aware that it will most likely contain a lot of personal information.
> If you don't want it to be exposed, make sure to add folders which you don't
> want to be indexed to "Do not search in these locations", inside
> "SystemSettings -> Search -> File Search".
> Alternatively, you can create a new user (with no personal information at
> all), and try to reproduce it.

Thanks for the reply!

The problem is that apparently some personal file is what is leading to the corrupted index. I assume that's why more users aren't reporting this problem, because it's something specific to our personal files.
Comment 20 Gerson 2019-07-06 19:55:02 UTC
Indeed... baloo is working properly in my wife's computer, which runs Manjaro KDE as well. The problem is that since there's no error message... we have do dig manually for the corrupted file. But since it takes a while for the db to get corrupted, this search is probably quite cumbersome. In my case, actually, it seems that the db only corrupts after I restart the computer.

I'll attach here the index file before and after I restart the computer. I'll name them index.good (before restart) and index.bad (after restart). It contains only file names, so there's nothing personal or secret that I care about there.
Comment 21 Gerson 2019-07-06 20:06:52 UTC
since the upload limit is 4MB, I'll attach the file with an online system: 

https://ufile.io/60gkivz8

The file (index.tgz) will be available for 30 days starting now.
Comment 22 S 2019-07-07 00:01:49 UTC
(In reply to Gerson from comment #20)

> In my case, actually, it seems that the db only corrupts after I restart the
> computer.

I've noticed this too.
Comment 23 Igor Poboiko 2019-07-07 08:49:39 UTC
(In reply to Gerson from comment #21)
> since the upload limit is 4MB, I'll attach the file with an online system: 
> 
> https://ufile.io/60gkivz8
> 
> The file (index.tgz) will be available for 30 days starting now.

Your help is very much appreciated! I will investigate it.

> In my case, actually, it seems that the db only corrupts after I restart the computer.

Does it happen after each restart, or only sometimes?
Is it enough to just restart baloo to corrupt index? (i.e. run in the terminal "balooctl stop" and then "balooctl start")
Comment 24 Igor Poboiko 2019-07-08 13:10:50 UTC
(In reply to Gerson from comment #21)
> since the upload limit is 4MB, I'll attach the file with an online system: 
> 
> https://ufile.io/60gkivz8
> 
> The file (index.tgz) will be available for 30 days starting now.

OK, I've looked at it. 

Baloo stores internally filesystem tree inside index in order to be able to reconstruct full path of particular document id. The root of this tree should correspond to root directory "/" (id = 0). In your case, it is not so, the tree corresponds to your home directory instead (this can be deduced from non-corrupted index), which is why Baloo is not happy - it cannot resolve paths.

This patch might help: https://phabricator.kde.org/D21427.
Can anyone confirm?
Comment 25 S 2019-07-08 13:54:08 UTC
(In reply to Igor Poboiko from comment #24)

> This patch might help: https://phabricator.kde.org/D21427.
> Can anyone confirm?

Nice work Igor! I'm building baloo5 with that patch now and I'll test it throughout my workday today.
Comment 26 S 2019-07-09 14:05:31 UTC
(In reply to S from comment #25)
> (In reply to Igor Poboiko from comment #24)
> 
> > This patch might help: https://phabricator.kde.org/D21427.
> > Can anyone confirm?
> 
> Nice work Igor! I'm building baloo5 with that patch now and I'll test it
> throughout my workday today.

This is tentatively looking good! It survived a long day of work yesterday, including quite a few reboots (because I was diagnosing a different unrelated bug... :-/ ) I'd say push this change to master.
Comment 27 S 2019-07-10 14:40:14 UTC
(In reply to S from comment #26)
> (In reply to S from comment #25)
> > (In reply to Igor Poboiko from comment #24)
> > 
> > > This patch might help: https://phabricator.kde.org/D21427.
> > > Can anyone confirm?
> > 
> > Nice work Igor! I'm building baloo5 with that patch now and I'll test it
> > throughout my workday today.
> 
> This is tentatively looking good! It survived a long day of work yesterday,
> including quite a few reboots (because I was diagnosing a different
> unrelated bug... :-/ ) I'd say push this change to master.

I can confidently say that the patch fixes this bug. Is there anything preventing the integration of the patch into the official code?
Comment 28 Gerson 2019-07-10 15:02:29 UTC
Great! I didn't try the patch yet, but it is great to know that it seems to have fixed the problem. I'll try as soon as I can.
Comment 29 S 2019-07-11 16:05:29 UTC
Over here:
https://bugzilla.suse.com/show_bug.cgi?id=1141028
(Stefan Brüns)
> A fix is already included in KF5.60.
Comment 30 Igor Poboiko 2019-07-14 15:52:03 UTC
(In reply to S from comment #29)
> Over here:
> https://bugzilla.suse.com/show_bug.cgi?id=1141028
> (Stefan Brüns)
> > A fix is already included in KF5.60.

He probably refers to the following commit (which I kind of missed here - sorry about it) - https://cgit.kde.org/baloo.git/commit/?id=8e7b54c54764f3e6eae7d79f0e0ce3079d56a3a0.

It should also work (and if it does, please drop a comment here so we can close this issue)
Comment 31 Gerson 2019-07-23 13:10:16 UTC
yes, I confirm. After upgrading to 5.60 the bug is gone! Thanks!