Bug 479180 - baloo sometimes fails to index content of new files; 'Mzerosize'
Summary: baloo sometimes fails to index content of new files; 'Mzerosize'
Status: REPORTED
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: Baloo File Daemon (show other bugs)
Version: 5.111.0
Platform: Fedora RPMs Linux
: NOR major
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-12-30 03:48 UTC by skierpage
Modified: 2023-12-30 03:48 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description skierpage 2023-12-30 03:48:06 UTC
SUMMARY
In https://discuss.kde.org/t/how-do-i-troubleshoot-baloo/2830/3? , a user reported Baloo did not index a .flac file as a music file, commenting
> I found that balooshow -x <file> exists. When I run this for files that are
> shown in Elisa I get the line Property Terms: Maudio Mflac T2 whereas
> the files that aren’t showing in Elisa have the line Property Terms: Mapplication
> Moctet Mstream.

I have not experienced that, but I have noticed that baloo sometimes does not index the contents of text files. A few times when I create a text file (on both btrfs file system and an NTFS partition mounted in Linux with the ntfs-3g FUSE file system), I noticed that its contents aren't indexed, and `balooshow -x <file>` shows
  XAttr Terms: 
  Plain Text Terms: 
  Property Terms: Mapplication Mx Mzerosize
i.e. no words in the file were indexed, and note the Mzerosize. The latter comes from baloo/src/file/basicindexingjob.cpp when filePathToStat() believes statBuf.st_size == 0. But the file is definitely non-zero length and baloo should have indexed its words.

These two cases may be unrelated, but in both it seems that baloo sometimes indexes a file when its contents aren't fully present. And in the second case, it seems `balooctl index <file>` fails to fix the problem and index the file contents; you have to `balooctl clear <file>` first.

STEPS TO REPRODUCE
0. Run `balooctl monitor` in a second terminal window
1. Somewhere in a GUI, enter a unique word like "flamablama", and copy it.
2. I used the Wayland command-line utility wl-paste in the terminal command `wl-paste > /path/to/file.txt` 
3. Run the terminal command `balooshow -x /path/to/file.txt`
4. Run the terminal command `baloosearch flamablama`
5. Run the terminal command `balooct index /path/to/file.txt`
6. Repeat steps 3 and 4.
7. Repeat the steps but instead create the text file in a text editor like vim.

OBSERVED RESULT
Sometimes, balooctl monitor shows
  Indexing new files
  Idle
without displaying "Indexing: /path/to/file.txt: Ok", and balooshow output includes
  Internal Info
  File Name Terms: Ffile Ftxt 
  XAttr Terms: 
  Plain Text Terms: 
  Property Terms: Mapplication Mx Mzerosize
, and baloosearch does not return the new file.

When indexing fails in this way, the manual forced indexing of the file  `balooctl index /path/to/file.txt` step prints
  Indexing /path/to/file.txt
  File(s) indexed
but the file's contents remain unknown to baloo. The second `balooshow -x /path/to/file.txt` prints slightly different metadata:
  Plain Text Terms: 
  Property Terms: Mplain Mtext T5 T8
The meta attribute Mzerosize is gone and baloo detected the mime type correctly (now text/plain, not x/application), but baloo still did not index the file contents.

When I create new files in the `vim` file editor, baloo seems to reliably index their contents.

EXPECTED RESULT
Baloo should reliably index new files.
`balooctl index` should actually index a file's current contents, even if you don't clear it from the index first.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma:
KDE Plasma Version: 5.27.0
KDE Frameworks Version: 5.111.0
Qt Version: 5.15.11 on Wayland

ADDITIONAL INFORMATION
_IF_ you notice this, the fix is to run `balooctl clear /path/to/file.txt` then `balooctl index /path/to/file.txt`.
I turned on kf.baloo and kf.filemetadata debug output and did not see anything useful in `journalctl` output.

I don't know if this is a file system issue; maybe Qt's filePathToStat() is caching file info.