Bug 464226 - Baloo and Nulls
Summary: Baloo and Nulls
Status: RESOLVED FIXED
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: Baloo File Daemon (show other bugs)
Version: unspecified
Platform: Other Linux
: NOR major
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-01-13 08:03 UTC by tagwerk19
Modified: 2023-04-22 10:14 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments
Text file containing a \000 (8 bytes, text/plain)
2023-01-13 08:03 UTC, tagwerk19
Details

Note You need to log in before you can comment on or make changes to this bug.
Description tagwerk19 2023-01-13 08:03:54 UTC
Created attachment 155254 [details]
Text file containing a \000

SUMMARY:
    Baloo seems to stumble when it meets a "null" character in a text file.

    A parallel or more general case of:

        https://invent.kde.org/frameworks/baloo/-/merge_requests/87

STEPS TO REPRODUCE:
    Download the test file into an indexed folder. The file contains:

        -> ^@ <-

    where the ^@ is a "null" byte. Ask baloo what it has as the indexed data:

        $ balooshow -x file-with-a-000.txt

OBSERVED RESULTS:
    You get:

        1625990000fc01 64513 1451417 file-with-a-000.txt [/home/test/Documents/file-with-a-000.txt]
                Mtime: 1673373876 2023-01-10T18:04:36
                Ctime: 1673373876 2023-01-10T18:04:36
                Cached properties:
                        Line Count: 1

        Internal Info
        Terms:   < > Mplain Mtext T5 T8 X20-1
        File Name Terms: F000 Fa Ffile Ftxt Fwith
        XAttr Terms:
        Internal Error - malformed term (short): ''
        Internal Error - malformed term (short): ''
        lineCount: 1

EXPECTED RESULTS:

        Internal Info
        Terms:   < > Mplain Mtext T5 T8 X20-1
        File Name Terms: F000 Fa Ffile Ftxt Fwith
        XAttr Terms:
        lineCount: 1

ADDITIONAL INFORMATION
    Igor Poboiko's "baloo-checkdb.py" script:

        https://invent.kde.org/frameworks/baloo/uploads/bdc9f5f17fc96490b7bd4a22ac664843/baloo-checkdb.py

    gives a couple of errors:

        ...
        Checking whether posting[docterms[docid]] contains docid (can take some time)...
        ERROR: 6236232384314369 (/home/test/Documents/file-with-a-000.txt) has term  which wasn't found in PostingDB
        ERROR: 6236232384314369 (/home/test/Documents/file-with-a-000.txt) has term  which wasn't found in PostingDB
        ...

    and the merge request mentions

        ... TermGenerator then generates proper (yet meaningless) terms out of those
        characters, and they end up in database ...

    In this case it's happening for a "null" in a text file rather than a problematic
    PDF. I think it should *not* be possible for a file to corrupt the database.
    A worry might be that a "specially crafted" file could perform mischief and flagging
    as "major" because of this.
Comment 2 tagwerk19 2023-04-19 13:48:56 UTC
Should arrive with Frameworks 5.105
Comment 3 tagwerk19 2023-04-22 10:14:08 UTC
Flagging Resolved/Fixed