Bug 412421 - 25+ char string without spaces search fails
Summary: 25+ char string without spaces search fails
Status: RESOLVED FIXED
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: Stefan Brüns
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-28 16:24 UTC by i16d0o+8ay17bwmgpcgw
Modified: 2023-11-13 21:41 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Dolphin Search - with 24 character search string (37.94 KB, image/png)
2021-06-02 20:19 UTC, tagwerk19
Details
Dolphin Search - with 25 character search string (33.63 KB, image/png)
2021-06-02 20:19 UTC, tagwerk19
Details

Note You need to log in before you can comment on or make changes to this bug.
Description i16d0o+8ay17bwmgpcgw 2019-09-28 16:24:57 UTC
Sometimes files have long filenames consisting of 25+ characters without spaces.

Baloo fails to find these files when searched by the file's full name. It works when baloo is disabled.

STEPS TO REPRODUCE
1. Generate a file with a 25+ character full name. abcdefghijklmnopqrstuvwxyz

OBSERVED RESULT

2. With baloo enabled, search for "abcdefghijklmnopqrstuvwxyz". Terminal: baloosearch abcdefghijklmnopqrstuvwxyz / Dolphin: Dolphin>Find abcdefghijklmnopqrstuvwxyz
3. File not found.
4. Search for a <25 string "abcd", "abcdefg", "abcdefghijklmnopq". Baloo finds the file.

EXPECTED RESULT

Baloo finds the file when searched by its full filename.

SOFTWARE/OS VERSIONS

Linux/KDE Plasma: Kubuntu 18.04, 19.04, KDE Neon 20190919-1119, Debian Buster KDE

(available in About System)
KDE Plasma Version: 5.12.8
KDE Frameworks Version: 5.44.0
Qt Version: 5.9.5

ADDITIONAL INFORMATION

If the filename is split with spaces: "abcdefgh ijklmnopqrs tuvwxyz"
the file can be found by its full filename.
Comment 1 tagwerk19 2021-06-02 20:19:20 UTC
Created attachment 138962 [details]
Dolphin Search - with 24 character search string
Comment 2 tagwerk19 2021-06-02 20:19:57 UTC
Created attachment 138963 [details]
Dolphin Search - with 25 character search string
Comment 3 tagwerk19 2021-06-02 20:25:32 UTC
Well I never...

Make sure baloo is running...

    $ balooctl status
    $ echo "Hello Penguin" > abcdefghijklmnopqrstuvwxyz
    $ baloosearch abcdefghijklmnopqrstuvwxy

    /home/xxxx/abcdefghijklmnopqrstuvwxyz

So, baloo is fine...

Run Dolphin, Ctrl-F and type abcdefghijklmnopqrstuvwxy

    You see a file match

Add a 'z' and...

    The match disappears...

See the attachments, flagging as Confirmed...

This is with:
    Neon Unstable
    Plasma: 5.22.80
    Frameworks: 5.83.0
    Qt: 5.15.3
Comment 4 tagwerk19 2021-06-02 20:34:54 UTC
(In reply to tagwerk19 from comment #3)
> So, baloo is fine...
Whups, didn't finish the test:

    $baloosearch abcdefghijklmnopqrstuvwxy 
    /home/xxxx/abcdefghijklmnopqrstuvwxyz
    Elapsed: 0.258107 msecs

    $baloosearch abcdefghijklmnopqrstuvwxyz
    Elapsed: 0.207821 msecs

So, baloo rather than dolphin:

    $balooshow -x abcdefghijklmnopqrstuvwxyz 
    143dd40000fc01 64513 1326548 abcdefghijklmnopqrstuvwxyz [/home/xxxx    /abcdefghijklmnopqrstuvwxyz]
            Mtime: 1622665708 2021-06-02T22:28:28
            Ctime: 1622665708 2021-06-02T22:28:28
            Cached properties:
                    Line Count: 1

    Internal Info
    Terms: Mplain Mtext T5 T8 X20-1 hello penguin 
    File Name Terms: Fabcdefghijklmnopqrstuvwxy 
    XAttr Terms: 
    lineCount: 1
Comment 5 Stefan Brüns 2023-07-05 18:54:07 UTC
Baloo currently handles term truncation only for "equals" queries, not "contains" queries (the default).

$> baloosearch filename=abcdefghijklmnopqrstuvwxyz
returns the file, while the following does not:
$> baloosearch filename:abcdefghijklmnopqrstuvwxyz

$> baloosearch abcdefghijklmnopqrstuvwxyz
is intenally expanded to:
$> baloosearch content:abcdefghijklmnopqrstuvwxyz OR filename:abcdefghijklmnopqrstuvwxyz
Comment 6 Bug Janitor Service 2023-07-06 19:27:41 UTC
A possibly relevant merge request was started @ https://invent.kde.org/frameworks/baloo/-/merge_requests/158
Comment 7 Stefan Brüns 2023-07-06 22:28:57 UTC
Git commit b7c8ce1a999225f0362b8be274a9d5c786c3edda by Stefan Brüns.
Committed on 06/07/2023 at 19:21.
Pushed by bruns into branch 'master'.

[SearchStore] Always use TermGenerator instead of QueryParser

The QueryParser handles two fairly distinct tasks, parsing of quoting
characters, and splitting of phrases into terms.

The Phrase/Term splitting is similar to the TermGenerator, but slightly
different. Using a different implementation for searching and DB storage
can cause matching errors.

While the nested QueryParser quoting /can/ be used, it is fairly
redundant, and problematic:

- Quoting is already handled by the AdvancedQueryParser, which always
  sits in front of the SearchStore.
- The QueryParser is *only* used for "contains" queries (e.g.
  filename:foo.png) not "equal" queries ("filename=foo.png").
- Quoting of phrases for both variants is different,
  content:\"\'a b\'\" vs. content=\"a \"b".
- The QueryParser does not handle term truncation (see bug reference).

Use the TermGenerator in all cases, so term splitting and quoting is
uniform.

M  +0    -1    autotests/integration/querytest.cpp
M  +7    -3    src/lib/searchstore.cpp

https://invent.kde.org/frameworks/baloo/-/commit/b7c8ce1a999225f0362b8be274a9d5c786c3edda
Comment 8 Stefan Brüns 2023-07-06 22:47:20 UTC
Git commit c85de29f33224e27f273f66fef09837d24fdfd2c by Stefan Brüns.
Committed on 06/07/2023 at 22:39.
Pushed by bruns into branch 'kf5_test'.

[SearchStore] Always use TermGenerator instead of QueryParser

The QueryParser handles two fairly distinct tasks, parsing of quoting
characters, and splitting of phrases into terms.

The Phrase/Term splitting is similar to the TermGenerator, but slightly
different. Using a different implementation for searching and DB storage
can cause matching errors.

While the nested QueryParser quoting /can/ be used, it is fairly
redundant, and problematic:

- Quoting is already handled by the AdvancedQueryParser, which always
  sits in front of the SearchStore.
- The QueryParser is *only* used for "contains" queries (e.g.
  filename:foo.png) not "equal" queries ("filename=foo.png").
- Quoting of phrases for both variants is different,
  content:\"\'a b\'\" vs. content=\"a \"b".
- The QueryParser does not handle term truncation (see bug reference).

Use the TermGenerator in all cases, so term splitting and quoting is
uniform.

M  +0    -1    autotests/integration/querytest.cpp
M  +7    -3    src/lib/searchstore.cpp

https://invent.kde.org/frameworks/baloo/-/commit/c85de29f33224e27f273f66fef09837d24fdfd2c
Comment 9 Stefan Brüns 2023-11-13 21:41:28 UTC
Git commit af0b611bced29e6cc00f120e9ff69470bd657a7d by Stefan Brüns.
Committed on 13/11/2023 at 21:41.
Pushed by bruns into branch 'kf5'.

[SearchStore] Always use TermGenerator instead of QueryParser

The QueryParser handles two fairly distinct tasks, parsing of quoting
characters, and splitting of phrases into terms.

The Phrase/Term splitting is similar to the TermGenerator, but slightly
different. Using a different implementation for searching and DB storage
can cause matching errors.

While the nested QueryParser quoting /can/ be used, it is fairly
redundant, and problematic:

- Quoting is already handled by the AdvancedQueryParser, which always
  sits in front of the SearchStore.
- The QueryParser is *only* used for "contains" queries (e.g.
  filename:foo.png) not "equal" queries ("filename=foo.png").
- Quoting of phrases for both variants is different,
  content:\"\'a b\'\" vs. content=\"a \"b".
- The QueryParser does not handle term truncation (see bug reference).

Use the TermGenerator in all cases, so term splitting and quoting is
uniform.
(cherry picked from commit b7c8ce1a999225f0362b8be274a9d5c786c3edda)

M  +0    -1    autotests/integration/querytest.cpp
M  +7    -3    src/lib/searchstore.cpp

https://invent.kde.org/frameworks/baloo/-/commit/af0b611bced29e6cc00f120e9ff69470bd657a7d