Bug 459205 - Baloo: Files "failed to index" are indexed
Summary: Baloo: Files "failed to index" are indexed
Status: RESOLVED WORKSFORME
Alias: None
Product: frameworks-baloo
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: 5.98.0
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: baloo-bugs-null
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-09-16 08:12 UTC by David Kredba
Modified: 2022-11-09 05:12 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Kredba 2022-09-16 08:12:52 UTC
SUMMARY
When Baloo finished file content indexing there were some files reported as failed to index using 'balooctl status'.
Some of them got indexed using 'balooctl index' FILE but some not.
Balooshow knew the file but complained about old data about the file.

Tried to do the 'balooctl clear' on that files, then balooshow to see they are gone followed by 'balooctl index' FILE.
All files become indexed, checked by 'balooshow -x' FILE but the same files were still reported as failed to index.

Did mdb_copy -n -c on the index file (no error reported, file went down in size by around 10 GiB), left KDE session (logged of), entered KDE session, tried 'balooctl check' what not helped. Tried to clear the files and index them again, got them indexed again but the same set of "failed" files was reported.

Found workaround, using KDE Settings GUI I set folders with those files to not be indexed, waited for file count downsized (balooctl status) and then allowed their indexing again one by one.
After that no failed files are being reported now.

Can this be done more elegant way please?



SOFTWARE/OS VERSIONS
KDE Plasma Version: 5.25.5
KDE Frameworks Version: 5.98.0
Qt Version: 5.15.5

ADDITIONAL INFORMATION
balooctl status
Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 386,606
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 27.77 GiB

balooctl indexSize
File Size: 27,77 GiB
Used:      2,34 GiB

PostingDB:     684,04 MiB    28.579 %
PositionDB:       2,22 GiB    95.046 %
DocTerms:       3,34 GiB   143.018 %
DocFilenameTerms:      26,52 MiB     1.108 %
DocXattrTerms:       4,00 KiB     0.000 %
IdTree:       5,30 MiB     0.221 %
IdFileName:      28,42 MiB     1.187 %
DocTime:      15,38 MiB     0.642 %
DocData:      25,97 MiB     1.085 %
ContentIndexingDB:            0 B     0.000 %
FailedIdsDB:            0 B     0.000 %
MTimeDB:       5,80 MiB     0.242 %
Comment 1 tagwerk19 2022-10-10 19:20:50 UTC
(In reply to David Kredba from comment #0)
> When Baloo finished file content indexing there were some files reported as
> failed to index using 'balooctl status'.
> Some of them got indexed using 'balooctl index' FILE but some not.
> Balooshow knew the file but complained about old data about the file.
Have a look at Bug 438382

If you run into this behaviour again, check whether baloo has indexed the file multiple times. Try 
    baloosearch -i FILE
and see if you get more than one hit for the file. The "-i" tells baloosearch to print the DocID. If you see several hits for FILE with different DocID's then you could (quite possibly) see the sort of issues you are reporting.

I see you were previously chasing a problem, Bug 437019, is there any overlap of behaviour here?
Comment 2 Bug Janitor Service 2022-10-25 05:01:09 UTC
Dear Bug Submitter,

This bug has been in NEEDSINFO status with no change for at least
15 days. Please provide the requested information as soon as
possible and set the bug status as REPORTED. Due to regular bug
tracker maintenance, if the bug is still in NEEDSINFO status with
no change in 30 days the bug will be closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

If you have already provided the requested information, please
mark the bug as REPORTED so that the KDE team knows that the bug is
ready to be confirmed.

Thank you for helping us make KDE software even better for everyone!
Comment 3 Bug Janitor Service 2022-11-09 05:12:27 UTC
This bug has been in NEEDSINFO status with no change for at least
30 days. The bug is now closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

Thank you for helping us make KDE software even better for everyone!