Created attachment 155254 [details] Text file containing a \000 SUMMARY: Baloo seems to stumble when it meets a "null" character in a text file. A parallel or more general case of: STEPS TO REPRODUCE: Download the test file into an indexed folder. The file contains: -> ^@ <- where the ^@ is a "null" byte. Ask baloo what it has as the indexed data: $ balooshow -x file-with-a-000.txt OBSERVED RESULTS: You get: 1625990000fc01 64513 1451417 file-with-a-000.txt [/home/test/Documents/file-with-a-000.txt] Mtime: 1673373876 2023-01-10T18:04:36 Ctime: 1673373876 2023-01-10T18:04:36 Cached properties: Line Count: 1 Internal Info Terms: < > Mplain Mtext T5 T8 X20-1 File Name Terms: F000 Fa Ffile Ftxt Fwith XAttr Terms: Internal Error - malformed term (short): '' Internal Error - malformed term (short): '' lineCount: 1 EXPECTED RESULTS: Internal Info Terms: < > Mplain Mtext T5 T8 X20-1 File Name Terms: F000 Fa Ffile Ftxt Fwith XAttr Terms: lineCount: 1 ADDITIONAL INFORMATION Igor Poboiko's "" script: gives a couple of errors: ... Checking whether posting[docterms[docid]] contains docid (can take some time)... ERROR: 6236232384314369 (/home/test/Documents/file-with-a-000.txt) has term which wasn't found in PostingDB ERROR: 6236232384314369 (/home/test/Documents/file-with-a-000.txt) has term which wasn't found in PostingDB ... and the merge request mentions ... TermGenerator then generates proper (yet meaningless) terms out of those characters, and they end up in database ... In this case it's happening for a "null" in a text file rather than a problematic PDF. I think it should *not* be possible for a file to corrupt the database. A worry might be that a "specially crafted" file could perform mischief and flagging as "major" because of this.
Possibly fixed with:
Should arrive with Frameworks 5.105
Flagging Resolved/Fixed