Bug 287469

Summary: [KDE 4.8 Beta 1] With an empty database, the file indexer can index significantly less files than KDE 4.7.
Product: [Unmaintained] nepomuk Reporter: Alejandro Nova <alejandronova>
Component: fileindexerAssignee: Sebastian Trueg <sebastian>
Status: RESOLVED FIXED    
Severity: normal CC: trueg
Priority: NOR    
Version: 4.8   
Target Milestone: ---   
Platform: Chakra   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Alejandro Nova 2011-11-24 15:46:10 UTC
Version:           git master (using Devel) 
OS:                Linux

When KDE 4.7 indexed files, with an empty database, I got a file count of about 6,000 files in the index. With KDE 4.8 Beta 1, the file count is 800, and a lot of files get reindexed on every reboot.

I'm using libstreamanalyzer and libstreams from trunk, with FFMPEG enabled. I noticed that some PDF are getting indexed as h263 videos. However, I'm seeing this regression with all kinds of files, and the only change has been KDE 4.8 Beta 1.

Reproducible: Always

Steps to Reproduce:
1. Recreate a database with Nepomuk (remove your KDE 4.7 database).
2. Let Nepomuk index your files.
3. Reboot. You'll notice KDE 4.8 reindexes a lot more than what KDE 4.7 did.

Actual Results:  
Lots of reindexings taking hours. Also, as PDFs don't get properly indexed, I can't use the deep search feature of Strigi.

Expected Results:  
Few or no reindexing. KDE stops reading the Nepomuk database in minutes. I can use the deep search feature of Strigi, just like KDE 4.7.

This is a regression from KDE 4.7.
Comment 1 Sebastian Trueg 2011-11-25 07:09:20 UTC
Just to be sure: are shared-desktop-ontologies and strigi versions the same as before?
Comment 2 Alejandro Nova 2011-11-25 10:48:46 UTC
No. Strigi version is the same, but I was using s-d-o 0.7 before, and that didn't work with KDE 4.8. I'm using now a git snapshot of s-d-o. That's the only change.
Comment 3 Sebastian Trueg 2011-11-25 11:05:52 UTC
Nevermind, I reproduced the issue and know the cause but not the reason. I should be able to fix this today - even if I have to revert a feature commit. I want this working.
Comment 4 Sebastian Trueg 2011-11-25 13:44:05 UTC
Git commit 74cef9969db7d0d0dfbff94dfb7fd3f3b76cccd1 by Sebastian Trueg.
Committed on 25/11/2011 at 14:40.
Pushed by trueg into branch 'master'.

Ignore existing resources without any information to be added.

If a resource exists and we do not have any data to add we simply
ignore it. This fixes file indexing if the parent folder resource
already exists.

BUG: 287469

M  +4    -4    services/storage/datamanagementmodel.cpp

http://commits.kde.org/nepomuk-core/74cef9969db7d0d0dfbff94dfb7fd3f3b76cccd1
Comment 5 Sebastian Trueg 2011-11-25 13:44:15 UTC
Git commit 2a96d191a5525ca927a7058a8a39e00e80c53399 by Sebastian Trueg.
Committed on 25/11/2011 at 14:43.
Pushed by trueg into branch 'master'.

Ignore existing resources without any information to be added.

If a resource exists and we do not have any data to add we simply
ignore it. This fixes file indexing if the parent folder resource
already exists.

BUG: 287469

M  +4    -4    nepomuk/services/storage/datamanagementmodel.cpp

http://commits.kde.org/kde-runtime/2a96d191a5525ca927a7058a8a39e00e80c53399
Comment 6 Sebastian Trueg 2011-12-06 16:05:30 UTC
Git commit 7f68206e668202f54c2fe3479d84b2ffb66933d9 by Sebastian Trueg.
Committed on 25/11/2011 at 14:40.
Pushed by trueg into branch 'symlinkHandling'.

Ignore existing resources without any information to be added.

If a resource exists and we do not have any data to add we simply
ignore it. This fixes file indexing if the parent folder resource
already exists.

BUG: 287469

M  +4    -4    services/storage/datamanagementmodel.cpp

http://commits.kde.org/nepomuk-core/7f68206e668202f54c2fe3479d84b2ffb66933d9