Bug 314559 - Fileindexer does not index recursively
Summary: Fileindexer does not index recursively
Status: RESOLVED FIXED
Alias: None
Product: nepomuk
Classification: Miscellaneous
Component: fileindexer (show other bugs)
Version: git master
Platform: Arch Linux Linux
: NOR major
Target Milestone: ---
Assignee: Nepomuk Bugs Coordination
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-02-06 22:09 UTC by Bastian Beischer
Modified: 2013-02-24 14:27 UTC (History)
9 users (show)

See Also:
Latest Commit:
Version Fixed In: 4.10.1


Attachments
output of commads (3.24 KB, application/octet-stream)
2013-02-07 02:32 UTC, Hrvoje Senjan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bastian Beischer 2013-02-06 22:09:51 UTC
In KDE 4.9 my nepomuk database had about 70000 indexed files. After upgrading KDE to 4.10 final today I thought it would be a good idea to let the nepomuk fileindexer re-index all my files in my home directory.

So I shut down nepomuk and logged out from KDE, deleted the database and the nepomuk configuration files in ~/.kde4/share/apps/nepomuk and ~/.kde4/share/config/ respectively and logged back in.

Result: the file indexer only indexes very few files (about 30) and then just informs me that it's idle. I can't get it to index the remaining files. Even after waiting for an hour and leaving the system idle, the situation does not change.


Reproducible: Always

Steps to Reproduce:
1. Log out of KDE
2. Make sure that there are no more nepomuk processes with "ps axu | grep nepo". Shutdown if necessary
3. rm -fr ~/.kde4/share/apps/nepomuk
4. rm -f ~/.kde4/share/config/{nepomukserverrc,nepomukstrigirc,nepomukbackuprc}
5. Log back into KDE and wait for the indexing to start
Actual Results:  
Indexer only indexes a few files and then stops.

Expected Results:  
Indexer should re-index all my files.

Indexing works fine for new files. If I copy one of the directories in my home folder to a new name, then all its contents are correctly indexed - however I don't want to do that, because it will change all my timestamps.
Comment 1 Vishesh Handa 2013-02-06 22:46:09 UTC
Strange.

Can you please provide the following information -

1. $ nepomukcmd query 'select count(distinct ?r) where { ?r kext:indexingLevel ?l . FILTER( ?l = "1"^^<http://www.w3.org/2001/XMLSchema#int> ). }'

2. $ nepomukcmd query 'select count(distinct ?r) where { ?r nie:url ?url . }'
Comment 2 Vishesh Handa 2013-02-06 22:47:29 UTC
Oops. Forgot to mention -

alias nepomukcmd="sopranocmd --socket `kde4-config --path socket`nepomuk-socket --model main --nrl"
Comment 3 Bastian Beischer 2013-02-06 22:53:29 UTC
1) callret-0 -> "133"^^<http://www.w3.org/2001/XMLSchema#int>
Total results: 1
Execution time: 00:00:00.1

2) callret-0 -> "192"^^<http://www.w3.org/2001/XMLSchema#int>
Total results: 1
Execution time: 00:00:00.1
Comment 4 Vishesh Handa 2013-02-06 23:08:57 UTC
So stuff clearly isn't passing the first stage indexing. Hmm.

Please enable debug messages for Nepomuk via kdebugdialog. After that you'll need to switch off the nepomuk-file-indexer via.

$ qdbus org.kde.nepomuk.services.nepomukfileindexer /servicecontrol shutdown

Wait for a bit, and then please restart it. $nepomukservicestub "nepomukfileindexer"
I'd like to see the debug output.
Comment 5 Bastian Beischer 2013-02-06 23:36:21 UTC
I get the following output. Then it stops and does nothing.

nepomukfileindexer(11454)/kdecore (KSycoca) KSycocaPrivate::openDatabase: Trying to open ksycoca from "/var/tmp/kdecache-beischer/ksycoca4"
nepomukfileindexer(11454)/nepomuk (library) Nepomuk2::ResourceManagerPrivate::_k_storageServiceInitialized: Nepomuk Storage service up and initialized.
nepomukfileindexer(11454)/nepomuk (library) Nepomuk2::MainModel::Private::init: Connecting to local socket "/tmp/ksocket-beischer/nepomuk-socket"
nepomukfileindexer(11454)/nepomuk (library) Nepomuk2::MainModel::Private::init: Connected :)
nepomukfileindexer(11454)/nepomuk (strigi service) Nepomuk2::IndexCleaner::IndexCleaner: LegacyData:  false
nepomukfileindexer(11454)/nepomuk (strigi service) Nepomuk2::IndexCleaner::IndexCleaner: StrigiGraphData:  false
"/org/freedesktop/UDisks2/drives/TOSHIBA_MK5061GSY_32DLP0ACT" : property "Drive" does not exist 
"/org/freedesktop/UDisks2/drives/HL_DT_STDVDRAM_GT33N_M18BBH04208" : property "Drive" does not exist 
"/org/freedesktop/UDisks2/drives/TOSHIBA_MK5061GSY_32DLP0ACT" : property "DeviceNumber" does not exist 
"/org/freedesktop/UDisks2/drives/TOSHIBA_MK5061GSY_32DLP0ACT" : property "Device" does not exist 
nepomukfileindexer(11454) XSyncBasedPoller::XSyncBasedPoller: 3 1
nepomukfileindexer(11454) XSyncBasedPoller::XSyncBasedPoller: XSync seems available and ready
nepomukfileindexer(11454) XSyncBasedPoller::setUpPoller: XSync Inited
nepomukfileindexer(11454) XSyncBasedPoller::setUpPoller: Supported, init completed
nepomukfileindexer(11454)/nepomuk (strigi service) Nepomuk2::IndexScheduler::slotScheduleIndexing: Normal
nepomukfileindexer(11454)/nepomuk (strigi service) Nepomuk2::IndexCleaner::clearNextBatch: "select distinct ?r where { graph ?g { ?r nie:url ?url . } . ?r nfo:fileName ?fn . ?g nao:maintainedBy <nepomuk:/res/9bb4ed11-833d-4ea5-a4b5-e7dc4ddaaf2d> . FILTER(REGEX(STR(?url),"^file:/")) . FILTER(((?url!=<file:///home/beischer>)) && (REGEX(STR(?fn),"^autom4te$") || REGEX(STR(?fn),"^.*\\.rcore$") || REGEX(STR(?fn),"^CTestTestfile\\.cmake$") || REGEX(STR(?fn),"^.*\\.o$") || REGEX(STR(?fn),"^.*\\.omf$") || REGEX(STR(?fn),"^\\.hg$") || REGEX(STR(?fn),"^.*\\.m4$") || REGEX(STR(?fn),"^.*\\.orig$") || REGEX(STR(?fn),"^moc_.*\\.cpp$") || REGEX(STR(?fn),"^conftest$") || REGEX(STR(?fn),"^\\.xsession-errors.*$") || REGEX(STR(?fn),"^CMakeTmpQmake$") || REGEX(STR(?fn),"^.*\\.tmp$") || REGEX(STR(?fn),"^po$") || REGEX(STR(?fn),"^\\.svn$") || REGEX(STR(?fn),"^\\.histfile\\..*$") || REGEX(STR(?fn),"^lzo$") || REGEX(STR(?fn),"^\\.bzr$") || REGEX(STR(?fn),"^\\.git$") || REGEX(STR(?fn),"^litmain\\.sh$") || REGEX(STR(?fn),"^cmake_install\\.cmake$") || REGEX(STR(?fn),"^CMakeFiles$") || REGEX(STR(?fn),"^.*\\.pc$") || REGEX(STR(?fn),"^.*\\.nvram$") || REGEX(STR(?fn),"^.*\\.elc$") || REGEX(STR(?fn),"^.*\\.la$") || REGEX(STR(?fn),"^CMakeCache\\.txt$") || REGEX(STR(?fn),"^confdefs\\.h$") || REGEX(STR(?fn),"^.*\\.gmo$") || REGEX(STR(?fn),"^.*\\.csproj$") || REGEX(STR(?fn),"^.*\\.rej$") || REGEX(STR(?fn),"^config\\.status$") || REGEX(STR(?fn),"^lost\\+found$") || REGEX(STR(?fn),"^confstat$") || REGEX(STR(?fn),"^.*\\.pyc$") || REGEX(STR(?fn),"^_darcs$") || REGEX(STR(?fn),"^CVS$") || REGEX(STR(?fn),"^.*\\.part$") || REGEX(STR(?fn),"^libtool$") || REGEX(STR(?fn),"^.*\\.aux$") || REGEX(STR(?fn),"^.*\\.po$") || REGEX(STR(?fn),"^CMakeTmp$") || REGEX(STR(?fn),"^.*\\.root$") || REGEX(STR(?fn),"^Makefile\\.am$") || REGEX(STR(?fn),"^.*\\.lo$") || REGEX(STR(?fn),"^.*\\.loT$") || REGEX(STR(?fn),"^.*~$") || REGEX(STR(?fn),"^.*\\.moc$") || REGEX(STR(?fn),"^.*\\.vm.*$") || REGEX(STR(?fn),"^.*\\.class$") || REGEX(STR(?fn),"^core-dumps$"))) . } LIMIT 20"
nepomukfileindexer(11454)/nepomuk (strigi service) Nepomuk2::IndexCleaner::clearNextBatch: (QUrl("nepomuk:/res/e345c3e7-6eaf-4fe6-aafc-56b625cc986c") ,   QUrl( "nepomuk:/res/98f8c704-1714-4ec1-9d81-54dd7fc55068" )  ,   QUrl( "nepomuk:/res/daa097a3-3e3f-4289-afc1-52e265baacb9" )  ,   QUrl( "nepomuk:/res/c9e26589-219e-4bfc-8720-3e408e32546b" )  ,   QUrl( "nepomuk:/res/ba186fd0-5985-44bd-a838-f60fccb0abea" )  ,   QUrl( "nepomuk:/res/cc86eeac-68ae-4d93-80ce-28d17d9dd633" )  ,   QUrl( "nepomuk:/res/74731241-dceb-4425-bb9d-5d086f62fb8f" )  ,   QUrl( "nepomuk:/res/8a213fcf-5c57-49cb-9645-16da31d4f56f" )  ,   QUrl( "nepomuk:/res/4068fb90-6a23-4ec9-bb5d-64273577326f" )  ,   QUrl( "nepomuk:/res/b31ab952-c85c-4ba5-87ae-b451bcbb64f4" )  ,   QUrl( "nepomuk:/res/ee71363c-a490-4e84-bb1d-f64a84f62546" )  ,   QUrl( "nepomuk:/res/58767343-8396-4637-95a5-5f1038b6851f" )  ,   QUrl( "nepomuk:/res/722adffd-428d-4ead-8c7b-a43c6635e0f4" )  ,   QUrl( "nepomuk:/res/e04b7409-b05c-424d-a58e-24e4dacc2ec2" )  ,   QUrl( "nepomuk:/res/28ab1de7-e2db-4a5c-91e4-cb949f494897" )  ,   QUrl( "nepomuk:/res/f06b73d4-d5e9-4fc0-a45b-7b1f0d36f94c" )  )
nepomukfileindexer(11454)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::enqueue: "/home/beischer"
nepomukfileindexer(11454)/nepomuk (strigi service) Nepomuk2::IndexScheduler::slotScheduleIndexing: Normal
Comment 6 Vishesh Handa 2013-02-06 23:42:06 UTC
Hmm. Sorry about asking for so much info. If possible could I also see the nepomukstrigirc file? Maybe I can use your file and try to duplicate it.
Comment 7 Bastian Beischer 2013-02-07 00:20:08 UTC
No problem :)

Here it is:

[General]
exclude filters=*.root,autom4te,*.rcore,CTestTestfile.cmake,*.o,*.omf,.hg,*.m4,*.orig,moc_*.cpp,conftest,.xsession-errors*,CMakeTmpQmake,*.tmp,po,.svn,.histfile.*,lzo,.bzr,.git,litmain.sh,cmake_install.cmake,CMakeFiles,*.pc,*.nvram,*.elc,*.la,CMakeCache.txt,confdefs.h,*.gmo,*.csproj,*.rej,config.status,lost+found,confstat,*.pyc,_darcs,CVS,*.part,libtool,*.aux,*.po,CMakeTmp,Makefile.am,*.lo,*.loT,*~,*.moc,*.vm*,*.class,core-dumps
exclude filters version=2
exclude folders[$e]=
exclude mimetypes=
folders[$e]=$HOME
index hidden folders=false

[RemovableMedia]
ask user=false
index newly mounted=false

[general]
legacyCleaning=false
Comment 8 Hrvoje Senjan 2013-02-07 02:32:57 UTC
Created attachment 76961 [details]
output of commads

I can also reproduce this with current master (well, not exactly *reproduce*)
After i suffered a corrupt sopranovirtuosobackend.db, and removed it, second level indexing never happens, only one file i indexed manually
Comment 9 Hrvoje Senjan 2013-02-07 03:11:24 UTC
After removing nepomukstrigirc, situation is the same, only a few files that where in $HOME have been indexed at level #2.
Researching with nepomukshow,  to me it looks it doesn't do recursive indexing, e.g. if one has Pictures folder (with subdirs) setup for indexing, it will only process *files* in Pictures dir, but not (in) recursive subdirs.
Comment 10 Bastian Beischer 2013-02-07 11:54:30 UTC
You maybe right Hrvoje,

if I try the following:

qdbus org.kde.nepomuk.services.nepomukfileindexer /nepomukfileindexer indexFolder $HOME/documents 1 1

where $HOME/documents is a folder which has several subfolders, it seems only the toplevel "documents" folder itself is indexed, all the files in subfolders are not. This is contradictory to the "1 1" option I passed, which should force the indexer to recurse.
Comment 11 Vangelis 2013-02-07 12:52:27 UTC
Affected by this bug as well in Kubuntu.
A discussion thread I opened in the KDE forums is here and many other people reported facing the same problem:
http://forum.kde.org/viewtopic.php?f=154&t=109909&p=258397#p258397

If you force nepomuk to index the files with nepomukindexer it works, but it will not do it automatically as it should:

Try this to index all of your files recursively under the Desktop directory:
cd ~/Desktop
find . -exec nepomukindexer {} \;
Comment 12 Hrvoje Senjan 2013-02-07 13:40:01 UTC
OK, this is not specific to 2nd level indexing, adding folders by hand one by one, makes the content 1st lvl indexed, which was not the case with e.g. only $HOME
Comment 13 Hrvoje Senjan 2013-02-07 13:57:36 UTC
Caused by commit 2f33141, reverting it, solves the situation
Comment 14 Bastian Beischer 2013-02-07 14:10:07 UTC
Confirmed Hrvoje,

one question on the side: How do you see how many files are indexed on 1st and 2nd level separately? Which number is reported  by nepomukcontroller?
Comment 15 Hrvoje Senjan 2013-02-07 20:18:03 UTC
(In reply to comment #14)
> Confirmed Hrvoje,
> 
> one question on the side: How do you see how many files are indexed on 1st
> and 2nd level separately? Which number is reported  by nepomukcontroller?

I am not aware there is a way to *see* those numbers separately. AFAIK nepomukcontroller reports all files indexed at any level, so, lvl #1
Comment 16 Janek Bevendorff 2013-02-08 19:46:47 UTC
I'm not sure if it is exactly the same issue (therefore I don't set this bug to confirmed yet), but I'm facing similar problems. Nepomuk just gets stuck at a very little amount of indexed files and my nepomukcmd output looks like that in comment #3.

I enabled debug output in kdebugdialog as mentioned above and noticed that Nepomuk is only queuing and indexing the same files over and over again. It's endlessly looping over my IRC logs which are currently open in Kopete (those which are not open in Kopete at the moment are obviously not indexed).

Some sample debug output:

nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::index: "/home/janek/logs/freenode_#plasma.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::enqueue: "/home/janek/logs/freenode_#gentoo.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::shouldIndex: "/home/janek/logs/freenode_#gentoo.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::index: "/home/janek/logs/freenode_#gentoo.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::enqueue: "/home/janek/logs/freenode_#gentoo.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::shouldIndex: "/home/janek/logs/freenode_#gentoo.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::index: "/home/janek/logs/freenode_#gentoo.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::enqueue: "/home/janek/logs/freenode_#gentoo.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::shouldIndex: "/home/janek/logs/freenode_#gentoo.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::index: "/home/janek/logs/freenode_#gentoo.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::enqueue: "/home/janek/logs/freenode_#plasma.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::shouldIndex: "/home/janek/logs/freenode_#plasma.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::index: "/home/janek/logs/freenode_#plasma.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::enqueue: "/home/janek/logs/freenode_#gentoo.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::shouldIndex: "/home/janek/logs/freenode_#gentoo.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::index: "/home/janek/logs/freenode_#gentoo.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::enqueue: "/home/janek/logs/freenode_#kde.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::shouldIndex: "/home/janek/logs/freenode_#kde.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::index: "/home/janek/logs/freenode_#kde.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::enqueue: "/home/janek/logs/freenode_#gentoo.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::shouldIndex: "/home/janek/logs/freenode_#gentoo.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::index: "/home/janek/logs/freenode_#gentoo.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::enqueue: "/home/janek/logs/freenode_#kde.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::shouldIndex: "/home/janek/logs/freenode_#kde.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::index: "/home/janek/logs/freenode_#kde.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::enqueue: "/home/janek/logs/freenode_#gentoo.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::shouldIndex: "/home/janek/logs/freenode_#gentoo.log"
nepomukfileindexer(5628)/nepomuk (strigi service) Nepomuk2::BasicIndexingQueue::index: "/home/janek/logs/freenode_#gentoo.log"

This is repeating over and over again, no other output at all.
Comment 17 Janek Bevendorff 2013-02-08 19:52:17 UTC
Oh, one more thing I noticed: it's not simply the IRC logs for the channels that are currently open in Konversation (I wrote Kopete above, I meant Konversation BTW :-)), it's only those who have a conversation going on right now. Every time someone writes something in one of those channels (which produces a new entry being append to the log file), Nepomuk writes a new line of debug output.

No other files seem to be indexed. Just those.
Comment 18 Vishesh Handa 2013-02-08 21:04:18 UTC
Git commit b651f9231ac30072418bb06d602951f0f05da22c by Vishesh Handa.
Committed on 08/02/2013 at 21:58.
Pushed by vhanda into branch 'KDE/4.10'.

Revert "BasicIndexingQueue: Use stacks instead of queues"

This reverts commit 2f33141aa6716550e38b11ec9a0b000dd74eea79.

The commit breaks recursive indexing. Doh!

M  +6    -12   services/fileindexer/basicindexingqueue.cpp
M  +2    -3    services/fileindexer/basicindexingqueue.h

http://commits.kde.org/nepomuk-core/b651f9231ac30072418bb06d602951f0f05da22c
Comment 19 Vishesh Handa 2013-02-14 08:17:19 UTC
*** Bug 314782 has been marked as a duplicate of this bug. ***