Bug 397242 - Baloo gets confused when a file is renamed over existing one
Summary: Baloo gets confused when a file is renamed over existing one
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: Baloo File Daemon (show other bugs)
Version: 5.48.0
Platform: Archlinux Linux
: HI major
Target Milestone: ---
Assignee: baloo-bugs-null
Depends on:
Reported: 2018-08-07 15:16 UTC by Alexander Meshcheryakov
Modified: 2023-07-10 16:14 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In: 5.52


Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Meshcheryakov 2018-08-07 15:16:27 UTC
Here is an example script to reproduce (has to be runned inside baloo indexed directory):

NAME=$(pwgen -N1); CONTENT=$(pwgen -N1)
echo "$CONTENT" > "${NAME}"_1.txt
echo "$CONTENT" > "${NAME}"_2.txt
sleep 5
baloosearch "${CONTENT}"
mv -v "${NAME}"_1.txt "${NAME}"_2.txt
sleep 5
baloosearch "${CONTENT}"

The first baloosearch as expected finds files "${NAME}"_1.txt and "${NAME}"_2.txt

But the second invocation (after "${NAME}"_1.txt is renamed to "${NAME}"_2.txt) finds "${NAME}"_2.txt twice!!!

I believe this is the reason why a get tons of duplicates of zim notes in search results. Zim desktop wiki for robustness saves its notes in file $NOTE.txt.zim-new~ and then moves this file over $NOTE.txt.

Also this obviously leads to unreasonably large index file etc etc
Comment 1 Nate Graham 2018-10-05 23:29:32 UTC
This appears to be working now with Baloo 5.51.
Comment 2 Nate Graham 2018-10-05 23:33:01 UTC
Whoops, it appears that this is actually fixed with https://phabricator.kde.org/D15944, which I forgot I was testing.
Comment 3 Igor Poboiko 2018-10-08 22:11:11 UTC
Git commit 7e5c005e6a3b563013e2ba8cb9c8f1b282e6f7b2 by Igor Poboiko.
Committed on 08/10/2018 at 22:10.
Pushed by poboiko into branch 'master'.

[balooctl] Fix "index" command with already indexed, but moved file

If I move a file (while baloo was not running) and perform `balooctl index` on new file
(so that document Id is not changed), it won't update the path of the file, keeping the
invalid entry in index.
Explicitly tell Baloo that we want to update everything concerning this file.
FIXED-IN: 5.52

Test Plan:
1) `echo "hello world" >~/file1`
2) `balooctl stop`
3) `mv ~/file1 ~/file2`
4) `balooctl start && balooctl index ~/file2`
5) `balooshow file2` and `balooshow -x <DOCUMENT_ID_OF_FILE>`
The first command should show the right path, and the second command shouldn't complain about bug in Baloo (invalid index entry)

Reviewers: #baloo, #frameworks, ngraham

Reviewed By: #baloo, ngraham

Subscribers: ngraham, kde-frameworks-devel

Tags: #frameworks, #baloo

Differential Revision: https://phabricator.kde.org/D15944

M  +1    -1    src/tools/balooctl/indexer.cpp

Comment 4 Alexander Meshcheryakov 2018-12-04 16:58:39 UTC
Just tested sequence that I've proposed in bug description with baloo 5.52. It still yields the same result, baloosearch finds test file twice.

The issue that Igor fixed must have been something else. His test plan is quite different and involves stopping baloo. The bug that I describe here is triggered when baloo is running nonstop.

Please retest this with test plan that I provided in description https://bugs.kde.org/show_bug.cgi?id=397242#c0 exactly as it is.
Comment 5 Igor Poboiko 2019-03-19 08:02:59 UTC
That was supposed to be caused by the same issue. 

Sorry, I cannot reproduce it anymore. Your script (with NAME="testfile" and CONTENT="testtesttest", since I don't have pwgen) yields:

Elapsed: 0.299802 msecs

renamed 'testfile_1.txt' -> 'testfile_2.txt'
Elapsed: 0.277733 msecs
while monitor yields:
Indexing new files
Indexing file content
Indexing: /home/eol/testfile_1.txt: Ok
Indexing: /home/eol/testfile_2.txt: Ok
Indexing modified files
Indexing file content
Indexing: /home/eol/testfile_2.txt: Ok
so everythings seems to be running smooth...
Comment 6 Alexander Meshcheryakov 2019-03-26 13:12:34 UTC
I'm running recent Arch linux and still able to reproduce this even with clean and fresh user. I have even recorded screencast: https://asciinema.org/a/CTun0j5byw1FsYwJvcfRw89mx

I don't know why you can't reproduce. I suggest to test with random string like this:

random_string() {
  head -c 100 /dev/urandom | md5sum | head -c 10
NAME=$(random_string); CONTENT=$(random_string)
echo "$CONTENT" > "${NAME}"_1.txt
echo "$CONTENT" > "${NAME}"_2.txt
sleep 5
baloosearch "${CONTENT}"
mv -v "${NAME}"_1.txt "${NAME}"_2.txt
sleep 5
baloosearch "${CONTENT}"

I'm going to retest this in virtual machine on some public live CD to make sure I have perfectly reproducible test rig.
Comment 7 Alexander Meshcheryakov 2019-03-29 14:37:08 UTC
I runned http://cdimage.ubuntu.com/kubuntu/daily-live/current/disco-desktop-amd64.iso in VirtualBox and in this image baloo handles such renames as expected, bug is NOT reproducible inside VM.

But it is STILL reproducible on my PC with Arch Linux.

Both systems have

$ balooctl --version
baloo 5.56.0

I'm confused. Can't figure out what is it so special on my PC that keeps this issue.
Comment 8 Igor Poboiko 2019-03-30 10:04:03 UTC
It is indeed weird.

What filesystem do you use? How does "balooctl monitor" react to those files?
Comment 9 Alexander Meshcheryakov 2019-03-30 11:57:51 UTC
I use ext4 on LUKS encrypted volume on LVM2.

This issue gets weirder and weirder. When it gets successfully reproduced it looks in balooctl monitor like this:

Indexing new files
Indexing file content
Indexing: /home/self/5fe45b3d45_1.txt: Ok
Indexing: /home/self/5fe45b3d45_2.txt: Ok

There is no reaction to renaming. But several times attempts to reproduce lead to this:

Indexing new files
Indexing file content
Indexing: /home/self/e0f789d783_2.txt: Ok

And only second file was present in baloosearch "$CONTENT" output. First one is ignored? Why?

So I tried echo "$CONTENT" | tee "${NAME}"_{01..10}.txt

And got once in balooctl monitor:
Press ctrl+c to stop monitoring
File indexer is running
Indexing new files
Indexing modified files
Indexing file content
Indexing: /home/self/e0f789d783_02.txt: Ok
Indexing: /home/self/e0f789d783_03.txt: Ok
Indexing: /home/self/e0f789d783_04.txt: Ok
Indexing: /home/self/e0f789d783_05.txt: Ok
Indexing: /home/self/e0f789d783_06.txt: Ok
Indexing: /home/self/e0f789d783_07.txt: Ok
Indexing: /home/self/e0f789d783_08.txt: Ok
Indexing: /home/self/e0f789d783_09.txt: Ok
Indexing: /home/self/e0f789d783_10.txt: Ok

First file is not indexed! Again!
Comment 10 Igor Poboiko 2019-03-30 16:59:11 UTC
It might be intentional: IIRC, Baloo doesn't want to reindex the same file too often (so it won't kill your machine when it finds i.e. logs being continuously written)
Comment 11 Stefan Brüns 2023-07-10 16:14:19 UTC
When baloo does not notice renames, thats likely a bad inotify limit on Arch.