Bug 196402 - can nepomuk run with lower priority ?
Summary: can nepomuk run with lower priority ?
Status: RESOLVED FIXED
Alias: None
Product: nepomuk
Classification: Miscellaneous
Component: general (show other bugs)
Version: unspecified
Platform: unspecified Linux
: NOR normal
Target Milestone: ---
Assignee: Sebastian Trueg
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-06-13 22:42 UTC by Ferdinand Gassauer
Modified: 2011-01-06 15:44 UTC (History)
10 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ferdinand Gassauer 2009-06-13 22:42:32 UTC
Version:           unknown (using 4.2.90 (KDE 4.2.90 (KDE 4.3 Beta2)) "release 138", KDE:KDE4:Factory:Desktop / openSUSE_11.1)
Compiler:          gcc
OS:                Linux (x86_64) release 2.6.27.23-0.1-default

during startup I see nepomukservices taking a lot of CPU and heavy disk-IO slowing down the startup process.

pls can this service be started with lower priority ?
Comment 1 Sebastian Trueg 2009-08-04 16:23:49 UTC
which service exactly? There are several services running.
Comment 2 Ferdinand Gassauer 2009-08-04 20:44:04 UTC
My feeling - all these services, indexing etc should always take a lower priority
Comment 3 Ferdinand Gassauer 2009-08-06 00:10:15 UTC
after installin 4.3.0 and enabeling neopomuk and strigi I see

Cpu(s): 58.7%us, 28.3%sy,  0.0%ni,  0.0%id, 11.7%wa,  0.7%hi,  0.7%si,  0.0%st
Mem:   1020908k total,   998544k used,    22364k free,    44056k buffers
Swap:  1052248k total,   475920k used,   576328k free,   250308k cached
801m 254m  25m S 46.9 25.6  41:28.52 nepomukservices
imho not much is left for user requests
Comment 4 Markus Kohls 2009-08-07 13:01:24 UTC
Maybe they could run with SCHED_IDLE policy, at least for initial indexing? (see man chrt)

SCHED_IDLE: Scheduling very low priority jobs
       (Since  Linux  2.6.23.)   SCHED_IDLE  can only be used at
       static priority 0; the process nice value has  no  influ-
       ence  for  this policy.  This policy is intended for run-
       ning jobs at extremely low priority (lower  even  than  a
       +19  nice value with the SCHED_OTHER or SCHED_BATCH poli-
       cies).

For me it increases overall responsness.
Comment 5 uprooter 2009-08-26 22:21:52 UTC
I like nepomuk but:
nepomuk is heavy and fat!!
it constantly consumes 27.7% of my RAM (Crazy!!) and consumes lots CPU when indexing.
This is why I find it very usefull to put this on my startup scripts:
/usr/sbin/cpulimit  -e "/usr/bin/nepomukservicestub" -l 20

This only has a drawback that it also make searching slower but 20% of cpu.
This has a drawback of slowing down search though. but I prefer slow search than CPU hog.
Comment 6 mutlu inek 2009-09-14 06:56:55 UTC
I think what most users experience is a very high i/o load rather than cpu usage. Surely, Nepomuk uses cpu, but this does scale on my system. What does not scale is the heavy reading and writing from and to disk while other programs are still starting up. Launching Firefox or Amarok after KDE startup takes about a minute on my system if I do not suspend Nepomuk. This should really be adjusted.
Comment 7 uetsah 2009-09-26 14:20:08 UTC
I also experience this problem every time I copy a folder, extract an archive, etc...
My suggestion is to implement user-controllable throttling (I've described it in more detail in the feature request here: https://bugs.kde.org/show_bug.cgi?id=208592)

I also agree with mutlu inek that it's probably mostly a disk i/o rather than a cpu issue.
Still, I think throttling CPU load (and minimizing memory usage!) will help a lot, because
  - when forced to do slower indexing (less CPU load), it will also take longer until it reads the next file (==> less disk load)
  - less memory usage means less swapping (==> less disk load)

Other than that, I guess putting the indexer to sleep for a specified number of milliseconds after every small indexing step could probably decrease the disk usage enough to make the system responsive and usable while strigi is indexing stuff.
Of course that should not necessarily be the default setting, as then people would complain about strigi indexing being slow even on fast machines, but when offering "slow indexing in the background" as an additional (non-default) indexing mode as described in my aforementioned feature request, this would not be a problem.
Comment 8 jos poortvliet 2009-10-07 12:30:48 UTC
I'd like to remind everybody that there is no problem if Nepomuk uses 100% CPU and disk IO (it'll finish faster), as long as that doesn't interfere with other applications' work. So I agree IO and CPU priority for indexing should be set as low as possible, using sched_idle and ionice prio 0, class 3:

ionice - sets or gets process io scheduling class and priority.

Usage:
  ionice [ options ] -p <pid> [<pid> ...]
  ionoce [ options ] <command> [<arg> ...]

Options:
  -n <classdata>      class data (0-7, lower being higher prio)
  -c <class>          scheduling class
                      0: none, 1: realtime, 2: best-effort, 3: idle

Limiting the max used % of CPU makes no difference in interactivity of the system (might even make it worse due to the way the linux scheduler works, depending on the implementation of the limitation), lower priority should.

I do want to note the linux IO and CPU schedulers imho are still not very good at keeping supposedly 'idle' scheduled jobs really out of the way, but that's something we can't fix.
Comment 9 mutlu inek 2009-10-07 18:06:35 UTC
Jos, you seem to not at all experience what we do and miss the point. While you are right that scheduling is not perfect on Linux, both tracker and beagle perform their function without getting in my way. And trackerd is set to niceness 12, while the nepomukservicestub that handles strigi is set to niceness 19. Clearly, this is not simply about "reniceing" the process, but about the way strigi crawls folders.

Actually, I get the worst performance hits when all my files are already indexed. What happens is that strigi crawls a large amount of folders on startup and again every few hours. Many of these folders have been indexed/crawled many times and haven't changed at all in years. This very fast crawling of folders does not actually contribute to indexing, but it causes incredibly high i/o usage for about a minute and completely freezes my computer. I really cannot even open a menu during this time.

Claiming that Linux does bad scheduling does not address this and does not explain why other indexers do not completely freeze my computer.
Comment 10 Sebastian Trueg 2009-10-08 12:11:58 UTC
That is correct. The Strigi service does a rather bad job at that. I am open to all possible improvements of the situation. How do tracker and beagle keep track of file changes anyway? I see no other solution than regular checks at the moment since systems like inotify have this upper limit on the folders you can watch.
Comment 11 mutlu inek 2009-10-08 23:12:18 UTC
I think I have good news.

I checked: tracker uses inotify. And you are right that there is an upper limit on the number of folders it can watch. But it has been increased drastically. It used to be 8192 and is now 524288. It can be set in /etc/sysctl.conf by changing (or adding) the variable fs.inotify.max_user_watches.

In their README, the tracker people recommend changing that value. But, if I read it correctly, they seem to assume that this user/admin intervention is not necessary any more: "default used to be 8192 and now is 524288." Thus, it seems to me that this is not only the new limit, but in fact the default.
http://git.gnome.org/cgit/tracker/tree/README

I checked what Ubuntu does and it seems they moved to 524288 back in spring 2008: https://bugs.launchpad.net/timevault/+bug/178067

I did a quick search of my $HOME and found that I have 230,000 files in 14,000 directories, consuming 170GB of space. I am sure there are people who have much more data, but I do think that I have many small files (i don't have any videos, for example) and doubt that there are many people out there who have more than 40 times as many folders as I do.

The only thing that has been annoying to me when using the tracker daemon is that it starts (re)indexing as soon as a file is created or changes. When editing or downloading large documents (PDFs, for example), this leads to intense cpu and i/o activity, exactly when you need to use both cpu and harddrive. So an even better solution would be to watch for changes using inotify, but to (re)index slightly delayed, when there is little user activity. I found a project that attempted something like this (though they wanted to do indexing way too much delayed): http://freshmeat.net/projects/kfsmd/
Comment 12 GJ 2010-02-05 10:14:49 UTC
I'm not sure if my problem is related to this bug but since I enabled Strigi I've found that my login time has increased. I have to wait for up to 80s after entering my KDE login credentials before I get to the desktop. Disabling strigi reduces my time from login to desktop to around 8-10s.

It seems my disk i/o is max'ed out during login with strigi enabled. Would it be possible to delay strigi from doing anything until the desktop has fully loaded?

I'm using kde4.4rc3 on openSUSE11.2 with a Thinkpad X60s laptop.
If this is a different bug then let me know and I'll open anew bug for it.
Comment 13 mutlu inek 2010-04-07 16:41:40 UTC
I believe this bug report can be closed now thanks to Sebastian Trüg's reworking of the indexing infrastructure. See:

http://websvn.kde.org/?view=revision&revision=1104720
http://websvn.kde.org/?view=revision&revision=1104721
Comment 14 Ferdinand Gassauer 2010-04-07 23:12:33 UTC
on OpenSuSE 11.2 KDE 4.4.2 it no "feelable" problems
Comment 15 Sebastian Trueg 2011-01-06 15:44:44 UTC
The original request has been implemented: all of Nepomuk runs with low cpu and io scheduling priority.