Bug 456108 - baloo reindexes files with date of 0 or prior to UNIX epoch on every restart
Summary: baloo reindexes files with date of 0 or prior to UNIX epoch on every restart
Status: CONFIRMED
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: 5.94.0
Platform: Fedora RPMs Linux
: NOR normal
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-06-29 03:42 UTC by skierpage
Modified: 2024-02-10 16:24 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description skierpage 2022-06-29 03:42:18 UTC
SUMMARY
Baloo kept reindexing 12 of my files every time I enable/disable it or run `balooctl check`. All 12 files had weird modification times according to `ls -l` and `stat`, either:
> Modify: 1969-12-31 16:00:00.000000000 -0800
> Modify: 1903-08-23 02:36:30.000000000 -0800
`balooshow` reports these times as either:
>   Mtime: 0 1969-12-31T16:00:00
>   Mtime: 2200842286 2039-09-28T10:04:46

STEPS TO REPRODUCE
0. Enable baloo from System Settings > File Search
1. Run `balooctl monitor &`
2. Make a file with mtime of 0 in the Unix epoch e.g. `touch -m --date=1970-01-01T00:00:00Z $HOME/file_from_epoch.txt`; make a file with a mtime prior to that, e.g. `touch -m --date=19420102 $HOME/file_from_1942.txt`
3. Run `balooctl check`
4. Run `balooctl check`
5. Run `balooshow $HOME/file_from_epoch.txt $HOME/file_from_1942.txt`

OBSERVED RESULT
`balooctl monitor` reports
> Started search for unindexed files
> Checking for unindexed files
> Indexing file content
> Indexing: /home/spage//file_from_epoch.txt: Ok
> Indexing: /home/spage//file_from_1942.txt: Ok
`balooshow` reports
> c8d800000028 40 51416 /home/spage/file_from_epoch.txt
>        Mtime: 0 1969-12-31T16:00:00
>        Ctime: 1656472422 2022-06-28T20:13:42
> c8ce00000028 40 51406 /home/spage/file_from_1942.txt
>        Mtime: 3411469696 2078-02-07T06:28:16
>        Ctime: 1656466455 2022-06-28T18:34:15

EXPECTED RESULT
These files have arguably weird modification times, but they aren't changing, so baloorunner/baloo_file should not reindex them.
Maybe baloo should handle file modification dates earlier than the start of the UNIX epoch, since my NTFS and BTRFS filesystems seem to allow them.

SOFTWARE/OS VERSIONS

Linux/KDE Plasma: 
KDE Plasma Version: 5.24.5
KDE Frameworks Version: 5.94.0
Qt Version: 5.13.3 on Wayland

ADDITIONAL INFORMATION
This sounds like bug 438074 but according to its reporter "mtime and ctime match in both files".

My first file is 0 seconds in the Unix epoch on January 1 1970 (shown in my timezone) and `balooshow` agrees with the file system. But for modification times before that, `balooshow` thinks the date is in the future.

Maybe baloorunner thinks all these odd modification times are in the future so hands the file off for reindexing; and/or something in baloo_file_extractor or kfilemetadata thinks its representation of the modification time is different than the modification time it gets from the file system, so it keeps reindexing. Perhaps if the file information from baloo_file_extractor is the same as the entry already in baloo's DB, baloo should not perform any update.

Workaround: I used, for example, `touch -m --date=20210709 path/to/file` to set normal modification dates to these files within the UNIX epoch, and after one final reindex baloo no longer reindexes them.
Comment 1 tagwerk19 2022-06-29 19:31:51 UTC
Can confirm

    Checked with F35 with BTRFS and an NTFS3 (the new Paragon code) mounted disc. The test files reindexed with each "balooctl check".
    Checked with Neon Unstable with ext4, ditto. The test files reindexed with each "balooctl check".

Good catch :-)
Comment 2 tagwerk19 2022-06-30 07:20:05 UTC
Just to make sure, baloo treats a file modified at the year 2038 "rollover"
    touch -m --date=2038-01-19T03:14:08Z $HOME/file_from_epochalypse.txt
as it should...

... when the signed 32 bit count of seconds since 1970 "goes negative". Checked as above on ext4, BTRFS, NTFS3
Comment 3 skierpage 2023-11-16 07:05:00 UTC
I enabled baloo logging and restarted baloo, and baloo reindexed a bunch of files with the same modified times in 1970-0-01 (1969 in my timezone).

The journal included lines like
       ... baloo_file[21819]: kf.baloo: "/home/spage/programs/skrooge_dev/skrooge/flatpak_build-dir/files/share/doc/HTML/en/skrooge/dashboard_report.png" mtime/ctime changed: 0 / 1675907011 -> 28800 / 1675907011

I think this is saying baloo thought the file's mtime was 0 and now thinks it's 28800, so it reindexed. My timezone is PST -0800, which is 28,800 seconds before GMT. baloo appears to not handle the timezone here, although `balooshow` does display the correct localtime (with no timezone) for recently updated files

If I manually call for reindex with `balooctl index /path/to/1970_file`, the command prints
    Skipping: /home/spage/programs/skrooge_dev/skrooge/flatpak_build-dir/files/share/doc/HTML/en/skrooge/dashboard_report.png Reason: Already indexed

I don't know if this skip happens when baloo thinks it needs to reindex these files.