Bug 457335 - File search is case sensitive
Summary: File search is case sensitive
Status: RESOLVED WORKSFORME
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: general (other bugs)
Version First Reported In: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-07-31 09:53 UTC by Artur Rudenko
Modified: 2022-10-02 04:49 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Artur Rudenko 2022-07-31 09:53:09 UTC
SUMMARY
***
NOTE: If you are reporting a crash, please try to attach a backtrace with debug symbols.
See https://community.kde.org/Guidelines_and_HOWTOs/Debugging/How_to_create_useful_crash_reports
***
When searching files, typing lowercase name in search bar doesn't find the appropriate uppercase files

STEPS TO REPRODUCE
1. Create SomeFileWithUppercase.txt
2. Create somefilewithuppercase.txt
3. Try to search "somefilewithuppercase"

OBSERVED RESULT
It finds only the file with lowercase name

EXPECTED RESULT
It finds both SomeFileWithUppercase.txt and  somefilewithuppercase.txt files

Operating System: Arch Linux
KDE Plasma Version: 5.25.3
KDE Frameworks Version: 5.96.0
Qt Version: 5.15.5
Kernel Version: 5.18.15-arch1-1 (64-bit)
Graphics Platform: X11
Processors: 16 × Intel® Core™ i7-10700F CPU @ 2.90GHz
Memory: 31.3 GiB of RAM
Graphics Processor: AMD Radeon RX 470 Graphics
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: Z490 GAMING X AX
System Version: -CF
Comment 1 tagwerk19 2022-08-21 13:22:26 UTC
For me, if I create SomeFileWithUppercase.txt and somefilewithuppercase.txt, I see both:

    $ baloosearch -i somefilewithuppercase
    1404130000fc01 /home/test/somefilewithuppercase.txt
    1403880000fc01 /home/test/SomeFileWithUppercase.txt

and I see, with balooshow, that the index contains the two files:

    $ balooshow -x somefilewithuppercase.txt
    1404130000fc01 64513 1311763 somefilewithuppercase.txt [/home/test/somefilewithuppercase.txt]
            Mtime: 1660941653 2022-08-19T22:40:53
            Ctime: 1660941653 2022-08-19T22:40:53
            Cached properties:
                    Line Count: 1

    Internal Info
    Terms: Mplain Mtext T5 T8 X20-1 hello penguin
    File Name Terms: Fsomefilewithuppercase Ftxt
    XAttr Terms:
    lineCount: 1

    $ balooshow -x SomeFileWithUppercase.txt
    1403880000fc01 64513 1311624 SomeFileWithUppercase.txt [/home/test/SomeFileWithUppercase.txt]
            Mtime: 1660941644 2022-08-19T22:40:44
            Ctime: 1660941644 2022-08-19T22:40:44
            Cached properties:
                    Line Count: 1

    Internal Info
    Terms: Mplain Mtext T5 T8 X20-1 hello penguin
    File Name Terms: Fsomefilewithuppercase Ftxt
    XAttr Terms:
    lineCount: 1

The "File Name Terms" for both are held in lower case - baloo squashes terms down to lower case.

I did this test on Neon with ext4 filesystem, one that treats SomeFileWithUppercase.txt and somefilewithuppercase.txt as distinct.

The question is what might happen with other filesystems; for vfat if you try to create SomeFileWithUppercase.txt and then somefilewithuppercase.txt, you are creating just the one file.

You are on an arch setup? What filesystem are you using? (I'm assuming your files are on a local disc)
Comment 2 Artur Rudenko 2022-08-21 13:38:04 UTC
(In reply to tagwerk19 from comment #1)
> For me, if I create SomeFileWithUppercase.txt and somefilewithuppercase.txt,
> I see both:
> 
>     $ baloosearch -i somefilewithuppercase
>     1404130000fc01 /home/test/somefilewithuppercase.txt
>     1403880000fc01 /home/test/SomeFileWithUppercase.txt
> 
> and I see, with balooshow, that the index contains the two files:
> 
>     $ balooshow -x somefilewithuppercase.txt
>     1404130000fc01 64513 1311763 somefilewithuppercase.txt
> [/home/test/somefilewithuppercase.txt]
>             Mtime: 1660941653 2022-08-19T22:40:53
>             Ctime: 1660941653 2022-08-19T22:40:53
>             Cached properties:
>                     Line Count: 1
> 
>     Internal Info
>     Terms: Mplain Mtext T5 T8 X20-1 hello penguin
>     File Name Terms: Fsomefilewithuppercase Ftxt
>     XAttr Terms:
>     lineCount: 1
> 
>     $ balooshow -x SomeFileWithUppercase.txt
>     1403880000fc01 64513 1311624 SomeFileWithUppercase.txt
> [/home/test/SomeFileWithUppercase.txt]
>             Mtime: 1660941644 2022-08-19T22:40:44
>             Ctime: 1660941644 2022-08-19T22:40:44
>             Cached properties:
>                     Line Count: 1
> 
>     Internal Info
>     Terms: Mplain Mtext T5 T8 X20-1 hello penguin
>     File Name Terms: Fsomefilewithuppercase Ftxt
>     XAttr Terms:
>     lineCount: 1
> 
> The "File Name Terms" for both are held in lower case - baloo squashes terms
> down to lower case.
> 
> I did this test on Neon with ext4 filesystem, one that treats
> SomeFileWithUppercase.txt and somefilewithuppercase.txt as distinct.
> 
> The question is what might happen with other filesystems; for vfat if you
> try to create SomeFileWithUppercase.txt and then somefilewithuppercase.txt,
> you are creating just the one file.
> 
> You are on an arch setup? What filesystem are you using? (I'm assuming your
> files are on a local disc)

Yes, I'm on updated arch setup and it's ext4 on local disc, but now after I reindexed everything, it works now (I searched using dolphin with file search in settings). This could be that newly created files remain unindexed some time after the creation. Maybe it would be better if dolphin did fallback to the classic search when baloo can't find needed files?
Comment 3 tagwerk19 2022-08-21 20:47:39 UTC
(In reply to Artur Rudenko from comment #2)
> This could be that newly created files remain unindexed
> some time after the creation ...
Yes, could be that baloo "got stuck". It relies on "iNotify" alerts to see when files are created, deleted or changed - and there are times when baloo misses something. 

A "balooctl check" brings things back up to date for new or changed files. Purging the index and reindexing is the "start again from scratch" solution.

In general though baloo should be quick at noticing new files and changed file details. It knows "content indexing" is harder work and queues up "full text" indexing (something you can watch happening with a "balooctl monitor")

> ... Maybe it would be better if dolphin did
> fallback to the classic search when baloo can't find needed files?
I'm not sure how Dolphin would know that baloo has not indexed particular files.

It *does* check what folders baloo is indexing and whether baloo is enabled and "falls back" to it's internal search if it thinks baloo hasn't indexed the folder. That can be messy - Bug 424871, 4th comment shows how messy.
Comment 4 Artur Rudenko 2022-08-23 12:13:02 UTC
(In reply to tagwerk19 from comment #3)
> (In reply to Artur Rudenko from comment #2)
> > This could be that newly created files remain unindexed
> > some time after the creation ...
> Yes, could be that baloo "got stuck". It relies on "iNotify" alerts to see
> when files are created, deleted or changed - and there are times when baloo
> misses something. 

Why does it miss? It's a baloo bug or inotify problem?

> > ... Maybe it would be better if dolphin did
> > fallback to the classic search when baloo can't find needed files?
> I'm not sure how Dolphin would know that baloo has not indexed particular
> files.

Maybe when iNotify notifies about a file created (or modified), baloo should mark parent folder as un-indexed?


> In general though baloo should be quick at noticing new files and changed file details. It knows "content indexing" is harder work and queues up "full text" indexing (something you can watch happening with a "balooctl monitor")

But in real situation it is not. For example, try to unpack some archive and then search in unpacked folder. Baloo won't find anything and i think in this case dolphin should just use a general search in this folder (while adding it to index queue), but it doesn't, and it ends up in useless search feature
Comment 5 tagwerk19 2022-08-23 21:09:46 UTC
(In reply to Artur Rudenko from comment #4)
> Why does it miss? It's a baloo bug or inotify problem?
Maybe something of both...

A historical "solved" case first; some distributions used to have a maximum number of folders that could be watched (fs.inotify.max_user_watches) rather low, see https://bugs.kde.org/show_bug.cgi?id=433204#c1. That caused issues in the past but is now generally OK.

There is also however the size of the queue, fs.inotify.max_queued_events. If something creates more than this number of events, quicker than baloo can deal with them, then the queue will overflow.

    $ sysctl fs.inotify

will show the values set in your system.

In your example, if you unpack a zip containing *many* small files, you may hit the max_queued_events limit. Maybe perhaps you are creating a folder structure as part of the unpacking and baloo cannot set up the watches on the new folders before files are created within them.

There are also the cases where baloo is busy (possibly when deleting a large bunch of files) and is not picking up the iNotify messages.

> ... Maybe when iNotify notifies about a file created (or modified), baloo should
> mark parent folder as un-indexed?
I think baloo could queue up a "balooctl check" if it gets a iNotify overflow failure. Possibly also if it suspects inconsistency but that feels messy.

> ... But in real situation it is not. For example, try to unpack some archive and
> then search in unpacked folder. Baloo won't find anything and i think in
> this case dolphin should just use a general search in this folder (while
> adding it to index queue), but it doesn't, and it ends up in useless search
> feature
If you are extracting *many* files, then repeat with a larger max_queued_events. You might hit the limits when deleting loads of files (Bug 353874) and you could dig a hole for yourself if you repeatedly delete and create large numbers of files.  Maybe check to see if you see the issue with "content indexing" disabled.

If need full text indexing, be aware that baloo will deliberately "pace itself" when indexing the extracted files; it indexes a block of 40 files, then the next and the next and so on. I've occasionally wondered if a widget showing baloo's "rate of indexing" would give useful feedback
Comment 6 tagwerk19 2022-08-30 14:51:00 UTC
Does this cover all the questions? You've got baloo working again?
Comment 7 Artur Rudenko 2022-09-02 09:39:50 UTC
(In reply to tagwerk19 from comment #6)
> Does this cover all the questions? You've got baloo working again?

Yeah, it started to work properly. I think this bug status should be changed to NOTABUG because the problem most likely is that baloo indexing was slow/buggy and not that it was wrong. If I have new questions about baloo, should I ask them in kde matrix channel or here?
Comment 8 Bug Janitor Service 2022-09-17 04:35:55 UTC
Dear Bug Submitter,

This bug has been in NEEDSINFO status with no change for at least
15 days. Please provide the requested information as soon as
possible and set the bug status as REPORTED. Due to regular bug
tracker maintenance, if the bug is still in NEEDSINFO status with
no change in 30 days the bug will be closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

If you have already provided the requested information, please
mark the bug as REPORTED so that the KDE team knows that the bug is
ready to be confirmed.

Thank you for helping us make KDE software even better for everyone!
Comment 9 Bug Janitor Service 2022-10-02 04:49:05 UTC
This bug has been in NEEDSINFO status with no change for at least
30 days. The bug is now closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

Thank you for helping us make KDE software even better for everyone!