Bug 434796

Summary: Can't find file with " (1)" in the filename
Product: [Frameworks and Libraries] frameworks-baloo Reporter: waldi1985
Component: generalAssignee: Stefan BrĂ¼ns <stefan.bruens>
Status: RESOLVED FIXED    
Severity: normal CC: kfm-devel, nate, tagwerk19
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description waldi1985 2021-03-22 21:17:24 UTC
SUMMARY
When I search "Frag" or "Frag*" without " no result ist displayed

STEPS TO REPRODUCE
1. open Dolphin, press f4, type touch "Fragen-Excel (1).xlsx"
2. press ctrl+f
3. search for one of the following Frag*, Frag, *.*, *.xlsx without ,

OBSERVED RESULT
no result is displayed

check: find . -name Fra*

EXPECTED RESULT
File has to be displayed


SOFTWARE/OS VERSIONS
KDE Frameworks 5.68.0
Qt 5.12.8 (built against 5.12.8)
The xcb windowing system

ADDITIONAL INFORMATION
Comment 1 Nate Graham 2021-03-22 21:18:25 UTC
If you run `baloosearch frag` in a terminal window, what does that show?
Comment 2 waldi1985 2021-03-22 21:59:28 UTC
everythin but not the searched File :D

when you try to reproduce the bug, you shouldn't find the File in the output. I think the problem is the " (1)" part of the filename.
Comment 3 Nate Graham 2021-03-23 03:22:34 UTC
Seems like the issue in Baloo itself. Moving the bug over there.
Comment 4 tagwerk19 2021-03-23 18:57:48 UTC
(In reply to waldi1985 from comment #2)
> everythin but not the searched File :D
I had a go and tried:

    cd Documents
    touch 'Fragen-Excel.xlsx'
    touch 'Fragen-Excel (1).xlsx' 

Checking with

    balooshow -x 'Fragen-Excel.xlsx'
    balooshow -x 'Fragen-Excel (1).xlsx' 

to confirm that the files "were noticed" by baloo, then a

    baloosearch Frag

and that seemed to work. Tried on a set of different systems

Neon Testing

    Plasma : 5.21.3
    Frameworks : 5.81.0 

Fedora 33

    Plasma : 5.20.5
    Frameworks : 5.79.0

OpenSuse 15.2

    Plasma : 5.18.6
    Frameworks : 5.71.0

All of these had English rather than German as the default language, not sure what else might be different.

Maybe disable indexing, copy the .local/shared/baloo/index file somewhere safe and build a new index.
Comment 5 waldi1985 2021-03-23 19:40:41 UTC
OK, interesting, a problem of indexing? What is the condition of a file for the indexer to index?

balooshow -x 'Fragen-Excel (1).xlsx'
Fragen-Excel (1).xlsx: No index information found

balooctl status
Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 154.401
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 238,88 MiB
Comment 6 tagwerk19 2021-03-23 21:13:51 UTC
Things to think about...

It can take some seconds for baloo to see that a new file has appeared and then index it. If you run 'balooctl monitor' in a separate window you can see the file streaming past as they are indexed.

It's possible to kick baloo to look through 'all files' and catch up with any it has missed. Keep the monitor running and run a 'balooctl check' in a new window.

Baloo should see new files appear and files as they are updated. It uses something called 'inotify'. It then decides whether to index the file or not, depending on the folder (included or excluded directories) or file types. My guess though, if you can find 'Fragen-Excel.xlsx' you would (normally) be able to find a 'Fragen-Excel (1).xlsx' in the same folder.

You can explicitly tell baloo to index the file with
    balooctl index filename
If 'balooshow -x' doesn't see the file after that, then there's something different happening ...

... It may worth trying the test with a real .xlsx file (rather than a 'touched' empty one). Baloo is checking whether the .xlsx is 'zipped' and might be confused as it's an empty file.
Comment 7 waldi1985 2021-03-24 13:08:39 UTC
After "balooctl check" the file is found:

balooshow -x 'Fragen-Excel (1).xlsx'
18025050028378118 2054 4196784 Fragen-Excel (1).xlsx [/home/xyz/Downloads/Fragen-Excel (1).xlsx]
        Mtime: 1616162623 2021-03-19T15:03:43
        Ctime: 1616162623 2021-03-19T15:03:43

Internal Info
Terms: Mapplication Mofficedocument Mopenxmlformats Msheet Mspreadsheetml Mvnd T5 T6
File Name Terms: 1 F1 Fexcel Ffragen Fxlsx excel fragen xlsx
XAttr Terms:

the file is not empty. You run in the same problem if you create a file with the same name thru Excel / LibreOffice / OpenOffice. You can try it by yourself. What prevent the indexer to index that file, and with another filename Fragen-Excel.xlsx it works well?
Comment 8 tagwerk19 2021-03-24 16:54:14 UTC
(In reply to waldi1985 from comment #7)
> After "balooctl check" the file is found
And it then turns up in a "baloosearch Frag"?

> ... What prevent the indexer to index that file, and with another
> filename Fragen-Excel.xlsx it works well? ...
I cannot think of a specific reason for a filename including a '(1)'. It can be however that baloo stops 'watching' for new/changed files after a certain number of folders.

Try watching with 'balooctl monitor' and creating several different files in your working directory and see if they get listed.

Are you indexing a large number of folders? What number do you get if you run

    sysctl fs.inotify.max_user_watches

On some systems this is 8192 - and it can easily be that people have more folders than that.
Comment 9 tagwerk19 2021-03-24 20:20:16 UTC
(In reply to waldi1985 from comment #7)
> Internal Info
> ...
> File Name Terms: 1 F1 Fexcel Ffragen Fxlsx excel fragen xlsx
Not quite sure what to make of this...

When trying to do the same, I get "just":
    File Name Terms: F1 Fexcel Ffragen Fxlsx
That is, each of the "words" in the name given with an "F" prefix...
Comment 10 waldi1985 2021-03-25 08:39:23 UTC
ok, i've played around, purge the index and after 'balooctl check', I can't reproduce the issue. However, the fs.inotify.max_user_watches hint seems plausible to me. In my case its 8k to.

It is not part of this report, but, when I delete file, with rm 'Fragen-Excel (1).xlsx', balooctl check and baloosearch Fragen the index entry still exists. Not sure if it is critical but for your notice.

Thanks for the help.
Comment 11 waldi1985 2021-03-25 08:41:43 UTC
what is the default intervall for balooctl check? Is there a cron tab to trigger it?
Comment 12 tagwerk19 2021-03-25 10:08:37 UTC
(In reply to waldi1985 from comment #10)
> ... However, the fs.inotify.max_user_watches hint seems
> plausible to me. In my case its 8k too ...
The default could easily be too low...

If you wait for a bit (for a 5.11 kernel) this problem should go away. However there's a fix you can do until then, see
    https://bugs.kde.org/show_bug.cgi?id=433204#c12

It could be that baloo "not seeing" files being deleted is also watch limit issue...
Comment 13 tagwerk19 2021-03-25 10:24:23 UTC
(In reply to waldi1985 from comment #11)
> what is the default intervall for balooctl check? Is there a cron tab to
> trigger it?
In theory you shouldn't need to do a "balooctl check' as baloo asks to be told about any changes. That's iNotify.

In practice, there are times that baloo misses stuff. If copy a file into your home directory - and are not logged on - baloo is not awake and won't see the change. If you have more folders than you got resources to watch (the user watches limit), ditto...

Other times? Yes, well, there may be still another bug or two :-)

It's still the case that a 'balooctl check' and a 'balooctl purge' are useful troubleshooting steps...
Comment 14 tagwerk19 2021-03-26 11:45:10 UTC
Let's say this is done... closing