Bug 434589 - Baloo gives strange search results if search string contains a dash
Summary: Baloo gives strange search results if search string contains a dash
Status: CONFIRMED
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: 5.79.0
Platform: Manjaro Linux
: NOR normal
Target Milestone: ---
Assignee: Stefan Brüns
URL:
Keywords:
: 459148 462787 (view as bug list)
Depends on:
Blocks:
 
Reported: 2021-03-18 15:57 UTC by Knut Hildebrandt
Modified: 2023-01-04 20:39 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Knut Hildebrandt 2021-03-18 15:57:32 UTC
I noticed this first in krunner and posted here (https://bugs.kde.org/show_bug.cgi?id=388857#c11). But I think it is a general baloo related problem since it not only appears in krunner and dolphin but also when using baloosearch. Each time when a dash "-" appears in a search string the results seem to disappear. On command line using baloosearch the behavior is equal to what is seen in dolphin. Search results are okay until entering the first character after a dash. Then apparently no files are found. But after entering at least one more character search results do appear again. I write at least one character, because in one observed case it was one and in the other it were three characters.

I supposed due to this strange behavior I can't find and open certain files in krunner anymore. 


SOFTWARE/OS VERSIONS
Operating System: Manjaro Linux
KDE Plasma Version: 5.21.2
KDE Frameworks Version: 5.79.0
Qt Version: 5.15.2
Comment 1 tagwerk19 2021-03-19 21:09:20 UTC
Can also see it...

   Neon Testing
   Baloo : 5.81.0
   Plasma : 5.21.3
   Frameworks : 5.81.0
Comment 2 Knut Hildebrandt 2021-03-19 23:21:37 UTC
Problem still exists after update to:
KDE Plasma Version: 5.21.3
KDE Frameworks Version: 5.80.0
Comment 3 tagwerk19 2021-03-20 12:43:53 UTC
Troubleshooting with guesswork...

STEPS TO REPRODUCE:

Create a testfile in a folder that baloo is indexing

    echo "somestrangetext" > ~/Documents/one-two-three.txt

Confirm that baloo has seen it

    balooshow -x ~/Documents/one-two-three.txt
    Internal Info
    Terms: Mplain Mtext T5 T8 X20-1 somestrangetext
    File Name Terms: Fone Fthree Ftwo Ftxt
    XAttr Terms:
    lineCount: 1

Notice that the "File Name Terms" holds the filename split on the hyphens...

Concentrate on the baloosearch results, check:

    baloosearch "o"
    baloosearch "one"
    baloosearch "one-tw"
    baloosearch "one-two"

and compare:

    baloosearch "o"
    baloosearch "one"
    baloosearch "one tw"
    baloosearch "one two"

OBSERVED RESULTS (WITH GUESSWORK)

Baloosearch doesn't return what might be a very long list of hits if you search for a single character, the search for "o" gives nothing. You need to give two characters or more...

Baloosearch seems to split the search string into parts, search for each and do an implicit 'AND'

    baloosearch "one two"

is a

    baloosearch "one AND two"

and, as it is an AND, it doesn't matter about the order of the parts, so:

    baloosearch "two one"

also works.

Baloosearch also seems to handle searches with truncated search terms

    baloosearch "one AND tw"

but the 'implicit AND' means that a

    baloosearch "one AND t"

doesn't return anything

When baloosearch is given a concatenated string, as in "one-two", it looks as if it does a further comparison to give a more exact match

    baloosearch "one-two"

gives the same as

    baloosearch "one two"

but

    baloosearch "two-one"

doesn't find a match.

Finally

    baloosearch "one-tw"

fails to find matches

EXPECTED RESULTS

1.. Searches for single characters such as "t" in

        baloosearch "one t"

    is not dropped as "potentially returning too many results"

2.. A search for "one-tw" as in:

        baloosearch "one-tw"

    can handle the truncated final component (the "tw")
Comment 4 Knut Hildebrandt 2021-03-20 16:02:41 UTC
Great research and interesting findings. But I wonder why baloo splits the filename. Dashes, hyphens, underscores and even spaces are commonly used in file and directory names, at least in Linux. One exception might be dots, which are often but not only used to separate the extension. But baloo should be able to detect the file type without examining the extension. I hope so, at least.

Anyhow, I would expect that the search results narrow down to the file(s) I'm actually looking for, when I continue entering parts of these filename(s). The seen behavior is strange and incomprehensible.
Comment 5 tagwerk19 2021-03-20 16:46:18 UTC
(In reply to Knut Hildebrandt from comment #4)
> ... I wonder why baloo splits the filename...
I'm not sure what practical alternatives there are. One of the amazing characteristics of baloo is its speed. You can, with "basic indexing only", index from '/', do filename search across your 'visible filesystem" and get a set of answers in a third of a second. That's impressive :-)

> ... baloo should be able to detect the file type without examining the extension.
It makes use of the 'mimetype' database...

> Anyhow, I would expect that the search results narrow down to the file(s)
> I'm actually looking for, when I continue entering parts of these
> filename(s). The seen behaviour is strange and incomprehensible.
Try feeding the Dolphin/Krunner with bits of the filename without the hyphens, think of it a looking for a filename containing "these words"...
Comment 6 Knut Hildebrandt 2021-03-20 17:00:48 UTC
(In reply to tagwerk19 from comment #5)
> I'm not sure what practical alternatives there are. One of the amazing
> characteristics of baloo is its speed. You can, with "basic indexing only",
> index from '/', do filename search across your 'visible filesystem" and get
> a set of answers in a third of a second. That's impressive :-)
I don't see why storing the complete filename should affect search speed. Locate is pretty fast too. And it finds filenames without this strange behavior.

> Try feeding the Dolphin/Krunner with bits of the filename without the
> hyphens, think of it a looking for a filename containing "these words"...
Since another bug has been fixed krunner finds files much earlier, before running into this problem. Anyhow, if I follow your suggestion after entering a single character all search results disappear in krunner, as they do in baloo. In my opinion baloo and thus dolphin and krunner should not work this way.
Comment 7 tagwerk19 2022-10-05 07:24:33 UTC
*** Bug 459148 has been marked as a duplicate of this bug. ***
Comment 8 Nate Graham 2023-01-04 20:23:52 UTC
*** Bug 462787 has been marked as a duplicate of this bug. ***