When running "balooctl index <folder>/*", (at least) the contents of the terms db are garbage afterwards. Running "for f in <folder>/*; do balooctl index "$f" ; done", does not have this effect and even restores the db to a good state. The latter obviously creates one write transaction per file, while the other one uses one write transaction overall.
(In reply to Stefan Brüns from comment #0) > When running "balooctl index <folder>/*", (at least) the contents of the > terms db are garbage afterwards. I could not reproduce. Maybe I overlooked the "garbage", could you specify please. balooctl index <folder>/*" => balooshow -x b.epub >multi.txt balooctl disable + enable for f in <folder>/*; do balooctl index "$f" ; done => balooshow -x b.epub >single.txt diff multi.txt single.txt 14d13 < title: buddenbrooks einer familie verfall 15a15 > title: buddenbrooks einer familie verfall
Git commit e1d1b7e87ff1e8ce6a7e03ecdf2902322cb8624a by Stefan Brüns. Committed on 29/05/2018 at 23:47. Pushed by bruns into branch 'master'. Avoid crash when reading corrupt data from document terms db Summary: The terms db contains terms, where each terms is stored independently (terminated with 0), or as a suffix to the previous term (terminated with 1). In case of corrupted data, the first terminator seen may be a 1, which leads to a crash when trying to access the previous term with QVector<>::last(). Show a debug message, to give a hint about the bad data, which can be fixed by reindexing the relevant file. Related: bug 392878 Test Plan: Corrupt the database Run balooshow -x <affected file(s)> Reviewers: #baloo, michaelh, ngraham, #frameworks, dhaumann Reviewed By: dhaumann Subscribers: dhaumann, kde-frameworks-devel, #frameworks Tags: #frameworks, #baloo Differential Revision: https://phabricator.kde.org/D12047 M +5 -0 src/codecs/doctermscodec.cpp M +5 -1 src/engine/documentdb.cpp https://commits.kde.org/baloo/e1d1b7e87ff1e8ce6a7e03ecdf2902322cb8624a
Is this 100% fixed now? Or is there still anything left to do?
(In reply to Nate Graham from comment #3) > Is this 100% fixed now? Or is there still anything left to do? Checked, on an index of 2,000,000 files (recorded Go games), the balooctl option to clear the entries for 10000 files: balooctl clear 2016-01* and reindex them balooctl index 2016-01* in one transaction. Done on a system with constrained RAM so that transaction filled the RAM and extended into swap. It completed; the index was seemingly OK. Can never be sure, but good enough?
Cool.