Bug 420339 - Baloo changes the mimetype of empty files to application/x-zerosize when content indexing
Summary: Baloo changes the mimetype of empty files to application/x-zerosize when cont...
Status: RESOLVED INTENTIONAL
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: 5.99.0
Platform: Neon Linux
: NOR normal
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
: 442898 (view as bug list)
Depends on:
Blocks:
 
Reported: 2020-04-20 13:22 UTC by tagwerk19
Modified: 2023-07-06 22:39 UTC (History)
7 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Comparison of Dolphin and Krunner search results (305.12 KB, image/png)
2020-04-20 13:22 UTC, tagwerk19
Details
Dolphin and Krunner search comparision with empty text file (140.21 KB, image/png)
2021-01-17 08:38 UTC, tagwerk19
Details
Dolphin and Krunner search comparision with sample text (128.19 KB, image/png)
2021-01-17 08:44 UTC, tagwerk19
Details

Note You need to log in before you can comment on or make changes to this bug.
Description tagwerk19 2020-04-20 13:22:29 UTC
Created attachment 127707 [details]
Comparison of Dolphin and Krunner search results

SUMMARY:

    If you search for 'image.jpg' where the file is empty, dolphin and baloosearch will list it, KRunner will not 

STEPS TO REPRODUCE:

    Create a file:

        touch ~/Pictures/image.jpg

    confirm that Baloo has indexed it:

        balooshow ~/Pictures/image.jpg

        baloosearch image.jpg

    Look for "image.jpg" in dolphin's search dialog and type "image.jpg" into Krunner 

OBSERVED RESULTS:

    Dolphin shows you image.jpg as a match, it is not listed in the Krunner output 

EXPECTED RESULTS:

    Expect that Dolphin and Krunner give the same results

    (It may be that Krunner deliberately drops empty files from it's results to save space, wonder whether this is a good option) 

SOFTWARE/OS VERSIONS:

    Dolphin 20.04
    Baloosearch/Balooctl 5.70.0
    from Neon Testing 

    KDE Plasma 5.18.4
    KDE Frameworks 5.70.0
    Qt 5.14.1

ADDITIONAL INFORMATION:

    Attachment shows results from dolphin (with zero size image.jpg
    and image2.jpg listed) and Krunner where they are listed as
    "Recent Documents"

    Querying via dbus:

        dbus-send --print-reply --dest=org.kde.runners.baloo /runner org.kde.krunner1.Match string:'jpg'

    does not list the zero size files
Comment 1 tagwerk19 2021-01-16 09:34:01 UTC
Confirm that the issue is still there with

    Dolphin 20.12.1
    Krunner 5.20.5
    Baloosearch/Balooctl 5.79.0

    KDE Plasma 5.20.5
    KDE Frameworks 5.79.0
    Qt 5.15.2

Might have been that it overlapped Bug 431664 but it's seemingly independent
Comment 2 Justin Zobel 2021-01-17 06:49:28 UTC
I have just tested this and I can see it in both. image.jpg in ~/Pictures/

Can you please test this on a new user and confirm that it's not a user-specific configuration that has been changed?
Comment 3 tagwerk19 2021-01-17 08:38:38 UTC
Created attachment 134954 [details]
Dolphin and Krunner search comparision with empty text file

Makes sense.

Yes, it is still there in a new user. I've replicated with a text file, the behaviour seems repeatable to me.

I'm attaching a couple of screenshots, one with an empty 'testfile.txt' in Documents, the second with the file holding "Hello World"
Comment 4 Justin Zobel 2021-01-17 08:43:45 UTC
Does it show up in `balooctl failed`?

What if you purge the index and recreate it?

`balooctl purge`
`balooctl resume`
Comment 5 tagwerk19 2021-01-17 08:44:49 UTC
Created attachment 134955 [details]
Dolphin and Krunner search comparision with sample text

Write "Hello World" to the test file, both Dolphin and Krunner find it...
Comment 6 tagwerk19 2021-01-17 08:57:04 UTC
(In reply to Justin Zobel from comment #4)

> What if you purge the index and recreate it?

Baloo seems to be working fine

bug420339@holborn:~/Documents$ balooctl purge

Stopping the File Indexer .... - done
Deleted the index database
Restarting the File Indexer

bug420339@holborn:~/Documents$ balooshow testfile.txt
26315264687105 64513 6127 testfile.txt [/home/bug420339/Documents/testfile.txt]
        Mtime: 1610870940 2021-01-17T09:09:00
        Ctime: 1610870940 2021-01-17T09:09:00

bug420339@holborn:~/Documents$ baloosearch testfile
/home/bug420339/Documents/testfile.txt
Elapsed: 1,7386 msecs

bug420339@holborn:~/Documents$ echo "Hello World" > testfile.txt

bug420339@holborn:~/Documents$ balooshow testfile.txt
26315264687105 64513 6127 testfile.txt [/home/bug420339/Documents/testfile.txt]
        Mtime: 1610873372 2021-01-17T09:49:32
        Ctime: 1610873372 2021-01-17T09:49:32
        Cached properties:
                Line Count: 1

bug420339@holborn:~/Documents$ baloosearch testfile
/home/bug420339/Documents/testfile.txt
Elapsed: 1,79348 msecs
Comment 7 tagwerk19 2021-01-18 08:19:55 UTC
Double checked with a live image (in a KVM guest)

    neon-testing-20210112-1820.iso 

On the live image I needed to enable baloo and also to have a little more patience - however I still see it happening.

I've taken the liberty of setting the bug back to "Reported", in reality it should also probably be flagged "Minor".

I tried catching details of what was happening with strace

    strace -o trace.log -f -t krunner

but without luck, I guess the real work is done elsewhere.
Comment 8 Natalie Clarius 2022-08-09 15:29:23 UTC
I think this is the same bug as https://bugs.kde.org/show_bug.cgi?id=457522.
Comment 9 tagwerk19 2022-08-30 14:56:19 UTC
*** Bug 457522 has been marked as a duplicate of this bug. ***
Comment 10 Nate Graham 2022-10-09 19:45:23 UTC
*** Bug 442898 has been marked as a duplicate of this bug. ***
Comment 11 tagwerk19 2022-10-10 07:48:40 UTC
 (In reply to tagwerk19 from comment #0)
> STEPS TO REPRODUCE:
> 
>     Create a file:
> 
>         touch ~/Pictures/image.jpg
> 
>     confirm that Baloo has indexed it:
> 
>         balooshow ~/Pictures/image.jpg
> 
>         baloosearch image.jpg
> 
>     Look for "image.jpg" in dolphin's search dialog and type "image.jpg"

With a little bit of experience from elsewhere...

    https://bugs.kde.org/show_bug.cgi?id=457522#c17

If baloo is *not* content indexing then "kmimefiletype" and "balooshow -x" both show image.jpg as image/jpeg, whereas if baloo is content indexing, "balooshow -x" shows the mimetype as application/x-zerosize. 

Debug output after a "balooctl purge":

    kf.baloo: Indexing 5636161028553729 "/home/test/Pictures/image.jpg" "application/x-zerosize"
    kf.filemetadata: No extractor for "application/x-zerosize"

Seems to be a pretty solid behaviour. In this situation

    baloosearch -i image.jpg

does find the file and

    baloosearch -i -t image image.jpg

doesn't (and presumably krunner follows the baloosearch behaviour...)
Comment 12 Natalie Clarius 2022-10-10 22:12:14 UTC
That KRunner won't find files of type "application/x-zerosize" will be solved once https://invent.kde.org/plasma/plasma-workspace/-/merge_requests/2006 is merged.

That Baaloo categorizes some files as "application/x-zerosize" rather than a more descriptive type when content indexing is enabled is still a bug imo. Can you open a separate report for this?
Comment 13 tagwerk19 2022-10-11 06:44:37 UTC
(In reply to Natalie Clarius from comment #12)
> ... That Baloo categorizes some files as "application/x-zerosize" rather than a
> more descriptive type when content indexing is enabled is still a bug imo.
I would agree. I can see that it's trying to provide additional info but, from my perspective, trying to understand what baloosearch/krunner are delivering, it's something extra to untangle. I prefer "If kmimetypefinder says it's that type, then baloo treats it as that type"

> Can you open a separate report for this? ...
This is probably the right report, it's focused just on this issue. Let me see if I have powerful enough magic to change the title...
Comment 14 Stefan Brüns 2023-07-06 19:55:43 UTC
An zero-sized file called foo.jpg is an empty file, not an image.

$> touch  zero.jpg
$> kmimetypefinder -c zero.jpg
application/x-zerosize
Comment 15 tagwerk19 2023-07-06 21:33:54 UTC
(In reply to Stefan Brüns from comment #14)
> An zero-sized file called foo.jpg is an empty file, not an image.

The gist is that the behaviour differs if you are content indexing or not...

(From tagwerk19 from comment #11)
> If baloo is *not* content indexing then "kmimefiletype" and "balooshow -x"
> both show image.jpg as image/jpeg, whereas if baloo is content indexing,
> "balooshow -x" shows the mimetype as application/x-zerosize. 

(I suppose I meant kmimetypefinder...)
Comment 16 Bug Janitor Service 2023-07-06 22:30:27 UTC
A possibly relevant merge request was started @ https://invent.kde.org/frameworks/baloo/-/merge_requests/159
Comment 17 Stefan Brüns 2023-07-06 22:31:32 UTC
Git commit b5f5829b5b10ee96e0fcf022b5cb964041809651 by Stefan Brüns.
Committed on 06/07/2023 at 22:29.
Pushed by bruns into branch 'master'.

[BasicIndexingJob] Ignore filename based mimetype for empty files

The content based mimetype for empty files is "application/x-zerosize",
and this would be set by the FileContentIndexer.

Instead of postponing the mimetype update to the content indexer run,
just set it and skip the content indexer completely.

M  +6    -2    autotests/unit/file/basicindexingjobtest.cpp
M  +6    -2    src/file/basicindexingjob.cpp

https://invent.kde.org/frameworks/baloo/-/commit/b5f5829b5b10ee96e0fcf022b5cb964041809651
Comment 18 Stefan Brüns 2023-07-06 22:39:37 UTC
Git commit 8c632f04b3234cb502ba742a2255994914c8c3e2 by Stefan Brüns.
Committed on 06/07/2023 at 22:39.
Pushed by bruns into branch 'kf5'.

[BasicIndexingJob] Ignore filename based mimetype for empty files

The content based mimetype for empty files is "application/x-zerosize",
and this would be set by the FileContentIndexer.

Instead of postponing the mimetype update to the content indexer run,
just set it and skip the content indexer completely.
(cherry picked from commit b5f5829b5b10ee96e0fcf022b5cb964041809651)

M  +6    -2    autotests/unit/file/basicindexingjobtest.cpp
M  +6    -2    src/file/basicindexingjob.cpp

https://invent.kde.org/frameworks/baloo/-/commit/8c632f04b3234cb502ba742a2255994914c8c3e2