Version: git master (using Devel) OS: Linux Nepomuk generates queries that Virtuoso cannot finish processing, and thus uses 100% of all available cores. | 727944 sparql select distinct ?r count(?p) as ?cnt where { ?r ?p ?o. filter( ?p in (<ht | | 687050 sparql select distinct ?r count(?p) as ?cnt where { ?r ?p ?o. filter( ?p in (<ht | | 3413120 sparql select distinct ?r count(?p) as ?cnt where { ?r ?p ?o. filter( ?p in (<ht | | 5501689 sparql select distinct ?r count(?p) as ?cnt where { ?r ?p ?o. filter( ?p in (<ht | | 1295661 sparql select distinct ?r count(?p) as ?cnt where { ?r ?p ?o. filter( ?p in (<ht | | 52 status() | | 2805864 sparql select distinct ?r count(?p) as ?cnt where { ?r ?p ?o. filter( ?p in (<ht | | 3190447 sparql select distinct ?r count(?p) as ?cnt where { ?r ?p ?o. filter( ?p in (<ht | From isql [/home/rrix/dev/install/usr/bin/nepomukservicestub] nepomukstorage(30280) Nepomuk::Sync::ResourceIdentifier::runIdentification: "select distinct ?r count(?p) as ?cnt where { ?r ?p ?o. filter( ?p in (<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#emailAddress>) ). ?r a <http://www.semanticdesktop.org/ontologies/2007/03/22/nco#EmailAddress> . optional { ?r <http://www.semanticdesktop.org/ontologies/2007/03/22/nco#emailAddress> ? [11:22] <rrix> o0 . } . filter(!bound(?o0) || ?o0="mailman-bounces@lists.fedoraproject.org"^^<http://www.w3.org/2001/XMLSchema#string>). filter( bound(?o0) ) . } order by desc(?cnt)" From konsole output matching the time every one of those runaway queries starts http://lxr.kde.org/source/kde/kdelibs/nepomuk-core/services/backupsync/lib/resourceidentifier.cpp#169 is the method that generates that query. Reproducible: Always Steps to Reproduce: Start nepomuk, wait. Actual Results: virtuoso-t begins to spin all cores at 100% Expected Results: Not... doing that :)
Also, virtuoso versions: virtuoso-opensource-6.1.4-2.fc16.i686 virtuoso-opensource-apps-6.1.4-2.fc16.i686 virtuoso-opensource-utils-6.1.4-2.fc16.i686 virtuoso-opensource-doc-6.1.4-2.fc16.noarch virtuoso-opensource-conductor-6.1.4-2.fc16.noarch From Fedora 16 i686
Very nice report. Thanks a lot. It was already on my todo list. :)
Great to hear :) I spent a few hours with PvK trying to get things narrowed down, so I'm glad to see it's already being taken care of. Thanks Sebas, you rock hard :)
Git commit 41ecd6d72b242c856153c93dd4a2efaec3c2d8e2 by Sebastian Trueg. Committed on 24/11/2011 at 18:02. Pushed by trueg into branch 'master'. Performance optimization in resource identification. In case only a single identifying property exists we get a much much much much faster query when avoiding all the optional and filter terms. Since this is the case for email identification this optimization does actually make a difference in email indexing. BUG: 286516 M +31 -13 services/backupsync/lib/resourceidentifier.cpp http://commits.kde.org/nepomuk-core/41ecd6d72b242c856153c93dd4a2efaec3c2d8e2
Git commit 08683854eab048ff76b188233b7285f3e6234810 by Sebastian Trueg. Committed on 24/11/2011 at 18:11. Pushed by trueg into branch 'master'. Performance optimization in resource identification. In case only a single identifying property exists we get a much much much much faster query when avoiding all the optional and filter terms. Since this is the case for email identification this optimization does actually make a difference in email indexing. CCBUG: 286516 M +31 -13 nepomuk/services/backupsync/lib/resourceidentifier.cpp http://commits.kde.org/kde-runtime/08683854eab048ff76b188233b7285f3e6234810
Git commit e8fa5d5cee2070cccab5286cd0859baee07a618e by Sebastian Trueg. Committed on 24/11/2011 at 18:11. Pushed by trueg into branch 'KDE/4.7'. Performance optimization in resource identification. In case only a single identifying property exists we get a much much much much faster query when avoiding all the optional and filter terms. Since this is the case for email identification this optimization does actually make a difference in email indexing. CCBUG: 286516 M +31 -13 nepomuk/services/backupsync/lib/resourceidentifier.cpp http://commits.kde.org/kde-runtime/e8fa5d5cee2070cccab5286cd0859baee07a618e
This commit is to the backupsync service in kde-runtime - how does it affect email indexing, happening in kdepim-runtime and talking to the storage service?
(In reply to comment #7) > This commit is to the backupsync service in kde-runtime - how does it affect > email indexing, happening in kdepim-runtime and talking to the storage service? There is a weird dependency between storage and backup service.
I am still having this issue, perhaps with a different filter query this time around. I will reopen this when I can find which query is causing it
nepomukservices A628AB40 ENTER SQLExecDirect SQLHSTMT 0x9fcae70 SQLCHAR * 0xa0d7f30 | sparql select distinct ?r count(?p) as ? | | cnt where { ?r ?p ?o. filter( ?p in (<ht | | tp://www.semanticdesktop.org/ontologies/ | | 2007/08/15/nao#prefLabel>,<http://www.se | | manticdesktop.org/ontologies/2007/03/22/ | | nco#fullname>) ). ?r a <http://www.seman | | ticdesktop.org/ontologies/2007/03/22/nco | | #PersonContact> . optional { ?r <http:/ | | /www.semanticdesktop.org/ontologies/2007 | | /08/15/nao#prefLabel> ?o0 . } . filter(! | | bound(?o0) || ?o0="Alugue Temporada SP") | | . optional { ?r <http://www.semanticdes | | ktop.org/ontologies/2007/03/22/nco#fulln | | ame> ?o1 . } . filter(!bound(?o1) || ?o1 | | ="Alugue Temporada SP"^^<http://www.w3.o | | rg/2001/XMLSchema#string>). filter( bou | | nd(?o0) || bound(?o1) ) . } order by de | | sc(?cnt) | looks like it's another query one that is causing me this issue. hate to re-open bugs on you, but this one is missing a SQL_SUCCESS ;)
Ohwait, that one does have a SQL_SUCCESS >.< Let's see if I can find the right one while it's still in my trace.
| sparql select distinct ?r count(?p) as ? | | cnt where { ?r ?p ?o. filter( ?p in (<ht | | tp://www.semanticdesktop.org/ontologies/ | | 2007/03/22/nco#emailAddress>) ). ?r a <h | | ttp://www.semanticdesktop.org/ontologies | | /2007/03/22/nco#EmailAddress> . ?r <http | | ://www.semanticdesktop.org/ontologies/20 | | 07/03/22/nco#emailAddress> "ry@n.rix.si" | | ^^<http://www.w3.org/2001/XMLSchema#stri | | ng> . } order by desc(?cnt) | Is not completing, for some reason, not exactly sure why or how to look in to it further. Any debugging help would be greatly appreciated :)
(In reply to comment #12) > | sparql select distinct ?r count(?p) as ? | > | cnt where { ?r ?p ?o. filter( ?p in (<ht | > | tp://www.semanticdesktop.org/ontologies/ | > | 2007/03/22/nco#emailAddress>) ). ?r a <h | > | ttp://www.semanticdesktop.org/ontologies | > | /2007/03/22/nco#EmailAddress> . ?r <http | > | ://www.semanticdesktop.org/ontologies/20 | > | 07/03/22/nco#emailAddress> "ry@n.rix.si" | > | ^^<http://www.w3.org/2001/XMLSchema#stri | > | ng> . } order by desc(?cnt) | > > Is not completing, for some reason, not exactly sure why or how to look in to > it further. Any debugging help would be greatly appreciated :) Are you sure this is it? This one is lightning fast here. Can you maybe try it in nepomukshell? Just for simplicity here is the cleaned up query: select distinct ?r count(?p) as ?cnt where { ?r ?p ?o. filter( ?p in (nco:emailAddress) ). ?r a nco:EmailAddress . ?r nco:emailAddress "ry@n.rix.si"^^xsd:string . } order by desc(?cnt)
I ran that query in nepsak three times, I now have these in isql: | 522069 sparql prefix nco: <http://www.semanticdesktop.org/ontologies/2007/03/22/nco#> p | | 596383 sparql prefix nco: <http://www.semanticdesktop.org/ontologies/2007/03/22/nco#> p | | 505492 sparql prefix nco: <http://www.semanticdesktop.org/ontologies/2007/03/22/nco#> p | Does nepsak rewrite or translate these queries any when they are ran? | sparql prefix nco:<http://www.semanticde | | sktop.org/ontologies/2007/03/22/nco#>SEL | | ECT DISTINCT ?person WHERE { graph ?g | | { ?person <http://akonadi-project.or | | g/ontologies/aneo#akonadiItemId> ?itemId | | . ?person a nco:PersonContact ; | | nco:hasEmailAddress ?email . | | ?email nco:emailAddress "ry@n.rix.si"^ | | ^<http://www.w3.org/2001/XMLSchema#strin | | g> . } } | Does that make sense at all, or am I chasing my tail in the wrong directions because it's 2:00 and i've been working since 10? :)
This is rather confusing since these are pretty simple queries that complete in no time for me. Did you finally see any results in nepomukshell?
Git commit 2936c781f01614a2e5c01f558e2e0f36affc0739 by Sebastian Trueg. Committed on 24/11/2011 at 18:02. Pushed by trueg into branch 'symlinkHandling'. Performance optimization in resource identification. In case only a single identifying property exists we get a much much much much faster query when avoiding all the optional and filter terms. Since this is the case for email identification this optimization does actually make a difference in email indexing. BUG: 286516 M +31 -13 services/backupsync/lib/resourceidentifier.cpp http://commits.kde.org/nepomuk-core/2936c781f01614a2e5c01f558e2e0f36affc0739
After about 600000ms it has not completed :( | 674988 sparql prefix nco: <http://www.semanticdesktop.org/ontologies/2007/03/22/nco#> p
So I backed up what I could and nerfed my nepomuk DB earlier this week, to see if this exists on a 'fresh' database, or if I'd had some graphs that creeped in that virtuoso couldn't digest. I don't have this problem any more, so I guess RESOLVE FIXED :) but I do have one or two new ones that i'll report separately after searching. happy holidays, btw, nepomukhackers :)
I've found another one that's doing this, this time for nepomukqueryservice: select distinct ?r where { { ?r a <http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag> . ?v2 <http://www.semanticdesktop.org/ontologies/2007/08/15/nao#hasTag> ?r . ?v3 <http://www.semanticdesktop.org/ontologies/2007/08/15/nao#hasTag> ?r . } . ?r <http://www.semanticdesktop.org/ontologies/2007/08/15/nao#userVisible> ?v1 . FILTER(?v1>0) . } ORDER BY DESC ( count(?v3) ) LIMIT 6