Bug 449102 - baloo is re-indexing all /home files whereas the files did not change.
Summary: baloo is re-indexing all /home files whereas the files did not change.
Status: RESOLVED NOT A BUG
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: Baloo File Daemon (show other bugs)
Version: 5.88.0
Platform: Gentoo Packages Linux
: NOR normal
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-01-24 21:42 UTC by iten
Modified: 2022-01-27 11:47 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description iten 2022-01-24 21:42:32 UTC
SUMMARY
***
I'm not sure its a bug. I did rsync my old /home to a new disk, and since, baloo is re-indexing same files again and again with messages like this: 
" id seems to have changed. Perhaps baloo was not running, and this file was deleted + re-created". 

But the files didn't changed since the last indexing. I'm wondering if its not because rsync did create files with time attributes that maybe disrupts baloo algorithm. The files created with rsync have modified time attributes that are prior to creation time. An example:

# LANG=C stat 801\ 001.jpg 
  File: 801 001.jpg
  Size: 1923181         Blocks: 3760       IO Block: 4096   regular file
Device: 10305h/66309d   Inode: 110231652   Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1026/ mathieu)   Gid: (  100/   users)
Access: 2022-01-07 02:04:30.899118806 +0100
Modify: 2011-08-12 00:33:58.000000000 +0200
Change: 2022-01-07 02:04:30.918118365 +0100
 Birth: 2022-01-07 02:04:30.899118806 +0100

That file has been modified in 2011, and was created by rsync in 2022.
***


STEPS TO REPRODUCE
1. rsync your /home directory to a new disk
2. use the new disk as your /home
3. delete baloo settings and baloo index
4. let baloo index all your content as new content

OBSERVED RESULT
next time baloo will start, he will reindex everything with "id seems to have changed" messages.

EXPECTED RESULT
baloo should index files that are not modified only once.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma: 5.15.16/5.88.0
(available in About System)
KDE Plasma Version: 5.88.0
KDE Frameworks Version: 5.88.0
Qt Version: 5.15.2 

ADDITIONAL INFORMATION
ext4 filesystem
Comment 1 tagwerk19 2022-01-25 00:10:46 UTC
(In reply to iten from comment #0)
> ... " id seems to have changed. Perhaps baloo was not running, and this file was
> deleted + re-created" ...
> 
> ... The files created with rsync have modified time attributes
> that are prior to creation time. An example:
> 
> # LANG=C stat 801\ 001.jpg 
>   File: 801 001.jpg
>   Size: 1923181         Blocks: 3760       IO Block: 4096   regular file
>   Device: 10305h/66309d   Inode: 110231652   Links: 1
>    ...
The Device and Inode are things to watch. If baloo says 'id seems to have changed", it's likely to be meaning these.

See if these change with a run of rsync (would it really do that?).

You can use balooshow to see what "metadata" baloo had indexed for the file and compare
    balooshow -x  801\ 001.jpg
Comment 2 iten 2022-01-25 07:38:48 UTC
(In reply to tagwerk19 from comment #1)
> (In reply to iten from comment #0)
> > ... " id seems to have changed. Perhaps baloo was not running, and this file was
> > deleted + re-created" ...
> > 
> > ... The files created with rsync have modified time attributes
> > that are prior to creation time. An example:
> > 
> > # LANG=C stat 801\ 001.jpg 
> >   File: 801 001.jpg
> >   Size: 1923181         Blocks: 3760       IO Block: 4096   regular file
> >   Device: 10305h/66309d   Inode: 110231652   Links: 1
> >    ...
> The Device and Inode are things to watch. If baloo says 'id seems to have
> changed", it's likely to be meaning these.
> 
> See if these change with a run of rsync (would it really do that?).
> 
> You can use balooshow to see what "metadata" baloo had indexed for the file
> and compare
>     balooshow -x  801\ 001.jpg

This is baloo -x output:
=============
LANG=C balooshow -x  801\ 001.jpg
692006400010305 66309 110231652 801 001.jpg [/home/mathieu/xxxxx/801 001.jpg]
        Mtime: 1313102038 2011-08-12T00:33:58
        Ctime: 1641517470 2022-01-07T02:04:30
        Cached properties:
                Width: 2493
                Height: 3280

Internal Info
Terms: Mimage Mjpeg T4 X26-2493 X27-3280 
File Name Terms: F001 F801 Fjpg 
XAttr Terms: 
width: 2493
height: 3280
=============
Inode, device, mtime and ctime looks correct.
I will do a baloo -x and a stat on that file the next time baloo will reindex it (that will take some time since /home is 1.5TB). Please note that I don't run rsync between two baloo reindex, so the file should not be modified.
Comment 3 iten 2022-01-25 17:52:19 UTC
To sum up the last reindexing:

======
journalctl --user|grep baloo|grep "801 001.jpg"
janv. 22 05:18:19 goldorak.actarus baloo_file_extractor[2826]: kf.baloo: "/home/mathieu/Documents/XXXXXX/801 001.jpg" id seems to have changed. Perhaps baloo was not running, and this file was deleted + re-created
janv. 24 22:28:54 goldorak.actarus baloo_file_extractor[690848]: kf.baloo: "/home/mathieu/Documents/XXXXXX/801 001.jpg" id seems to have changed. Perhaps baloo was not running, and this file was deleted + re-created
======
LANG=C balooshow -x  801\ 001.jpg
692006400010305 66309 110231652 801 001.jpg [/home/mathieu/XXXXXX/801 001.jpg]
        Mtime: 1313102038 2011-08-12T00:33:58
        Ctime: 1641517470 2022-01-07T02:04:30
        Cached properties:
                Width: 2493
                Height: 3280

Internal Info
Terms: Mimage Mjpeg T4 X26-2493 X27-3280 
File Name Terms: F001 F801 Fjpg 
XAttr Terms: 
width: 2493
height: 3280
==
692006400010305 66309 110231652 801 001.jpg [/home/mathieu/Documents/XXXXXX/801 001.jpg]
        Mtime: 1313102038 2011-08-12T00:33:58
        Ctime: 1641517470 2022-01-07T02:04:30
        Cached properties:
                Width: 2493
                Height: 3280

Internal Info
Terms: Mimage Mjpeg T4 X26-2493 X27-3280 
File Name Terms: F001 F801 Fjpg 
XAttr Terms: 
width: 2493
height: 3280
======

As you can see, baloo has indexed the same file twice while the file has not changed. In fact, he has reindexed my all /home directory (1.5T) twice. I will wait the next iteration and post journalctl log + balooshow -x and stat on this file after the next auto-indexing will be done to confirm this.
Comment 4 iten 2022-01-26 20:26:13 UTC
The problem seems to not occur again, baloo did not reindex my entire /home content. I think its because i made a mistake, as I have edited the ballofilerc and added "first run=true" after the first run. After the second run, I toggled it to "first run=false" after reading the documentation correctly and it seems to be OK now.
Sorry for the erroneous report. I will close the bug.
Comment 5 tagwerk19 2022-01-27 08:43:49 UTC
(In reply to iten from comment #4)
> ... added "first run=true" ...
Good news that it has stopped reindexing.

I've never managed to get the "first run" option to behave as I think it "ought to" but here setting to true could explain the reindexing.

It's perhaps worth thinking that baloo does a "quick" scan to make a list of what it needs to content index. The indexing itself then happens "slowly", as you see, 40 files in a batch. If you've got 1.5TB, that could be a long job. It's possible you are seeing "reindexing" still running, some days after your rsync (and it does not matter if you've closed down and restarted in the middle, Baloo keeps the list of what it has to do in its index)

If we want to dig further, we'd need to know if rsync generates an "iNotify" notification when looking at a file, even it if does not overwrite it. My suspicion is that if baloo sees an iNotify notification, it reindexes the file even if the modification time has not changed. I can do something like:
    touch -a -m -t 201108120033.58 "801 001.jpg"
that opens the file but does not change it and then resets the access and modifications times to your 2011 date; baloo reindexes the file after the "touch". It's a cautious approach...
Comment 6 iten 2022-01-27 11:36:02 UTC
Ok,
I have deleted the "first run" entry in baloofilerc. Its seems to be OK, no full reindexing occured.
I did start balooctl monitor and:

touch -a -m -t 201108120033.58 "801 001.jpg"

And as you said, balooctl monitor said its indexing it:
=================
LANG=C balooctl monitor
Press ctrl+c to stop monitoring
File indexer is running
Idle
Indexing Extended Attributes
Idle
=================
What I will try is to create a directory in wich i will rsync some contents 2 times, and look what will happens.
Comment 7 iten 2022-01-27 11:42:03 UTC
here is the result:

1st run:
mkdir balootest
rsync -rogtplD --stats --delete Documents/Themes/ balootest/
Number of files: 146 (reg: 131, dir: 15)
Number of created files: 145 (reg: 131, dir: 14)
Number of deleted files: 0
Number of regular files transferred: 131
Total file size: 30,990,711 bytes
Total transferred file size: 30,990,711 bytes
Literal data: 30,990,711 bytes
Matched data: 0 bytes
File list size: 0
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 31,006,841
Total bytes received: 2,627

sent 31,006,841 bytes  received 2,627 bytes  20,672,978.67 bytes/sec
total size is 30,990,711  speedup is 1.00
====
balooctl monitor output:
Idle
Indexing new files
Indexing new files
Indexing new files
Indexing Extended Attributes
Indexing file content
Indexing: /home/mathieu/balootest/Sound/785-sounds/minimize.mp3: Ok
Indexing: /home/mathieu/balootest/silent-1280x1024.jpg: Ok
Indexing: /home/mathieu/balootest/verbose-1280x1024.jpg: Ok
Indexing: /home/mathieu/balootest/Sound/Funky Night Startup - Premaster.mp3: Ok
Indexing: /home/mathieu/balootest/Sound/Startup.mp3: Ok
Indexing: /home/mathieu/balootest/Sound/190-starcraft_snd/README: Ok
Indexing: /home/mathieu/balootest/Sound/190-starcraft_snd/chatrequest.wav: Ok
Indexing: /home/mathieu/balootest/Sound/190-starcraft_snd/filetransfer.wav: Ok
Indexing: /home/mathieu/balootest/Sound/190-starcraft_snd/message.wav: Ok
Indexing: /home/mathieu/balootest/Sound/190-starcraft_snd/systemmsg.wav: Ok
Indexing: /home/mathieu/balootest/Sound/190-starcraft_snd/url.wav: Ok
Indexing: /home/mathieu/balootest/Sound/785-sounds/error.mp3: Ok
Indexing: /home/mathieu/balootest/Sound/785-sounds/gotmail.mp3: Ok
Indexing: /home/mathieu/balootest/Sound/785-sounds/logoff.mp3: Ok
Indexing: /home/mathieu/balootest/Sound/785-sounds/maximize.mp3: Ok
Indexing: /home/mathieu/balootest/Sound/785-sounds/rollup.mp3: Ok
Indexing: /home/mathieu/balootest/Sound/785-sounds/select.mp3: Ok
Indexing: /home/mathieu/balootest/Sound/785-sounds/welcome.mp3: Ok
Indexing: /home/mathieu/balootest/Sound/785-sounds/xkill.mp3: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Click1.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Error1.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Error2.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Exit1_1.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Exit1_2.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/K3b_success.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Kmail.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Knock.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Kopete_notify.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Kopete_offline.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Kopete_send.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Kopete_status.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Maximize.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Minimize1.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Minimize4.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Popup.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Question.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Question_background.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/README: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Restore_down.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Restore_up.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Shade_down.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Shade_up.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Startup1_1.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Startup1_2.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Startup1_3.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Startup1_4.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/borealis.jpg: Ok
Indexing: /home/mathieu/balootest/Sound/Feather_sound_set_v1.0/maximize.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/install.sh: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/uninstall.sh: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Config/k3b.eventsrc: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Config/kdevelop.eventsrc: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Config/kmail.eventsrc: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Config/knotify.eventsrc: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Config/konsole.eventsrc: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Config/kopete.eventsrc: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Config/ksysguard.eventsrc: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Config/kwin.eventsrc: Ok
Indexing: /home/mathieu/balootest/Sound/Borealis/Config/proxyscout.eventsrc: Ok
Indexing: /home/mathieu/balootest/Sound/Chords/chord-down.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Chords/chord-email.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Chords/chord-end.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Chords/chord-error.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Chords/chord-exclam.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Chords/chord-start.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Chords/chord-up.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Faux_Strings/Classic Startup.mp3: Ok
Indexing: /home/mathieu/balootest/Sound/Faux_Strings/Dance Startup.mp3: Ok
Indexing: /home/mathieu/balootest/Sound/Feather_sound_set_v1.0/IM_notification.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Feather_sound_set_v1.0/beep.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Feather_sound_set_v1.0/call.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Feather_sound_set_v1.0/device_connect.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Feather_sound_set_v1.0/device_disconnect.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Feather_sound_set_v1.0/ending.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Feather_sound_set_v1.0/fail.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Feather_sound_set_v1.0/incoming_call.wav: Ok
Indexing: /home/mathieu/balootest/Sound/kJazz/error.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Feather_sound_set_v1.0/loading.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Feather_sound_set_v1.0/loading_2.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Feather_sound_set_v1.0/minimize.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Feather_sound_set_v1.0/new_mail_notification.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Feather_sound_set_v1.0/opening.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Feather_sound_set_v1.0/panic.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Feather_sound_set_v1.0/popup.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Feather_sound_set_v1.0/warning.wav: Ok
Indexing: /home/mathieu/balootest/Sound/Nouveau dossier/error.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Nouveau dossier/messenger_received.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Nouveau dossier/messenger_send.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Nouveau dossier/notification.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Nouveau dossier/shutdown.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Nouveau dossier/startup.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/Nouveau dossier/user_online.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/amadeus/shutdown.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/amadeus/startup.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/angel_ogg/angel_close.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/angel_ogg/angel_close2.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/angel_ogg/angel_critical.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/angel_ogg/angel_notify.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/angel_ogg/angel_open.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/angel_ogg/angel_question.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/angel_ogg/angel_warning.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/crystallo_sound_theme/close.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/crystallo_sound_theme/error.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/crystallo_sound_theme/info.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/crystallo_sound_theme/minimize.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/crystallo_sound_theme/question.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/crystallo_sound_theme/readme: Ok
Indexing: /home/mathieu/balootest/Sound/crystallo_sound_theme/restore.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/crystallo_sound_theme/shutdown.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/crystallo_sound_theme/startup.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/crystallo_sound_theme/startup2.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/kJazz/change_virtualDesktop.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/kJazz/click.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/kJazz/email.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/kJazz/empty_bin.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/kJazz/maximize.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/kJazz/messenger_online.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/kJazz/messenger_received.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/kJazz/messenger_send.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/kJazz/minimize.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/kJazz/notification.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/kJazz/serious_error.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/kJazz/shutdown.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/kJazz/startup.ogg: Ok
Indexing: /home/mathieu/balootest/Sound/kJazz/window_closed.ogg: Ok
Indexing: /home/mathieu/balootest/cursors/ComixCursors-0.7.tar.bz2: Ok
Indexing: /home/mathieu/balootest/cursors/ComixCursors-LH-0.7.tar.bz2: Ok
Indexing: /home/mathieu/balootest/cursors/ComixCursors-LH-Opaque-0.7.tar.bz2: Ok
Indexing: /home/mathieu/balootest/cursors/ComixCursors-Opaque-0.7.tar.bz2: Ok
Idle
Comment 8 iten 2022-01-27 11:43:26 UTC
2nd run of rsync:
rsync -rogtplD --stats --delete Documents/Themes/ balootest/

Number of files: 146 (reg: 131, dir: 15)
Number of created files: 0
Number of deleted files: 0
Number of regular files transferred: 0
Total file size: 30,990,711 bytes
Total transferred file size: 0 bytes
Literal data: 0 bytes
Matched data: 0 bytes
File list size: 0
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 3,356
Total bytes received: 33

sent 3,356 bytes  received 33 bytes  6,778.00 bytes/sec
total size is 30,990,711  speedup is 9,144.50
=======
balooctl monitor: no activity, baloo is staying idle.
Comment 9 iten 2022-01-27 11:47:58 UTC
But this test is different of what happened in my case.

In my case, i did:

rsync /home to a new disk
then use the new disk as /home

So I did only one rsync, but as I said, maybe i did a bad use of "first run" entry in baloofilerc.