Bug 305903 - files in paths of which the name contains a special character are indexed over and over again (system running in ISO-8859-15-encoding, not UTF-8)
Summary: files in paths of which the name contains a special character are indexed ove...
Status: RESOLVED FIXED
Alias: None
Product: nepomuk
Classification: Miscellaneous
Component: fileindexer (show other bugs)
Version: git master
Platform: Ubuntu Linux
: NOR normal
Target Milestone: ---
Assignee: Nepomuk Bugs Coordination
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-08-27 22:06 UTC by Gunter Ohrner
Modified: 2013-07-16 23:47 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gunter Ohrner 2012-08-27 22:06:22 UTC
Every time I log in, the nepomuk file indexer service indexes a bunch of files.

The only think these files seem to have in common is that they are located in different directories where at least one of the parent directories on the path contains special characters like German umlauts.

My whole system is running in ISO-8859-15 encoding instead of UTF-8 for historical reasons, so maybe this could be an URL or pathname encoding problem?

I'm willing to provide any additional information required, if you say what you need and how to obtain it.

Reproducible: Always

Steps to Reproduce:
Log in to KDE with file indexing enabled.
Actual Results:  
System load will go up and a "nice"d file indexer process will index a bunch of files which have not changed since the last login. (Actually, which have not changed for ages.)

After it finishes, which takes quite some time, the file indexer will go to sleep.

Expected Results:  
The file indexer should only index files which were changed or added after the last index run.
Comment 1 Christoph Feck 2012-08-27 22:27:47 UTC
> My whole system is running in ISO-8859-15 encoding instead of UTF-8 for historical reasons

Seriously? What are those historical reasons not using Unicode? Unicode has been created to avoid problems you mention above.
Comment 2 Gunter Ohrner 2012-08-28 21:08:54 UTC
I was relunctant to change my system encoding so far, as I'm not sure if all applications would be happy with the contents of their existing files if the encoding suddenly changes.

All filenames also would have to be recoded, but AFAIK there are tools for doing just that. The file contents problem is harder to solve.

So far, I had only few problems with this choice, so I had no real reason to risk the switch. However, I have to admit that the number of bugs related to applications just expecting or requiring the system encoding to be UTF-8 is on the rise recently.

Basically, applications should be able to work in whatever system locale has been configured - they are free to work with unicode internally, of course, and libiconv and/or recode help with interfacing with the outside world...

Does KDE state that the Software Compilation will only work properly on UTF-8 systems?
Comment 3 Vishesh Handa 2013-07-07 21:20:01 UTC
Since your bug report, we have migrated from Strigi and are shipping our own indexers which probably should not have a problem with non UTF8 encoded filenames.

Could you please test with KDE SC 4.10? If the problem still occurs then could you please enable debug messages via kdebugdialog and then run -

$ nepomukindexer ThatFile

and paste the output?
Comment 4 Gunter Ohrner 2013-07-16 19:48:50 UTC
Yes, seems to work now.

I now have the problem that a virtuoso-t process is using much CPU for quite some time after each login - "qdbus org.kde.nepomuk.services.nepomukqueryservice" does not show any running query - but that's probably another problem or even expected?

The systray applet talks about "searching for new changes" (backtranslated from my German locale), so that's probably the cause for virtuoso-t being busy?
Comment 5 Vishesh Handa 2013-07-16 19:51:04 UTC
Do you have akonadi email indexing enabled?
Comment 6 Gunter Ohrner 2013-07-16 20:46:30 UTC
Yes.
Comment 7 Vishesh Handa 2013-07-16 23:47:39 UTC
I have a potential fix for that for 4.11. Anyway, marking this bug as FIXED. Feel free to file another bug about the startup problem, though it's a big issue, so I *need* to fix it, and it is very unlikely that I'm going to forget.