Bug 64608

Summary:	searching/indexing large collection takes prohibitively long
Product:	[Applications] juk	Reporter:	bryanv
Component:	general	Assignee:	Scott Wheeler <wheeler>
Status:	RESOLVED FIXED
Severity:	wishlist
Priority:	NOR
Version First Reported In:	unspecified
Target Milestone:	---
Platform:	Gentoo Packages
OS:	Linux
Latest Commit:		Version Fixed/Implemented In:
Sentry Crash Report:

Description bryanv 2003-09-20 18:54:30 UTC

Version:           1.95 (using KDE KDE 3.1.4)
Installed from:    Gentoo Packages
Compiler:          gcc version 3.3.1 20030904 (Gentoo Linux 3.3.1-r1, propolice) 
OS:          Linux

Hye guys, I have a ~15000 song collection, and Juk essentially becomes non-functional trying to operate on this size of a collection. (If I minimize or switch desktops and come back while searchin, the GUI is often frozen) I deleted my old cache file, and now juk has been reindexing for over ten minutes now and is not done yet (dual AMD 2000, 7200 RPM ATA100 drives).  Frankly this is the biggest problem I have seem with every single jukebox-type music sorting/playing app I have tried (about 6 of them) except one.  I don't know how the Yammi (http://yammi.sourceforge.net/) guys do it, but their indexing and very useful and cool fuzzy search is *blazingly* fast on my collection.  I mean results come back instantly and initial indexing was pretty spiffy to, there is no comparison with Juk or any of the many others I tried in my search. Unfortunately, Yammi is just a frontend for noatun or xmms, which I think really, really sucks.  I also want to use an integrated KDE app too. So I hope you will take things the right way when I suggest you go check out whatever magic they are using that allows it to  work so well on large collections and steal it to make Juk better.

Regards.

Comment 1 Stephan Kulow 2003-09-20 18:59:28 UTC

Scott, see? I'm not the only one

Comment 2 bryanv 2003-09-20 19:18:49 UTC

just as an update, juk took almost 10 more minutes to finish indexing.  (and btw
why was it doing this at all without my explicitly asking it to, and telling it
what directories to look in?) Amazingly. I think Yammi uses simple flat xml
files for secondary storage of its cache, I don't know what it does in memory,
but please go take a look and steal it.  I really have tried *many*
player/sorting apps and they all suffer incredibly from problems with large
collections except Yammi.

Comment 3 Iván Sánchez Ortega 2003-09-20 23:54:40 UTC

bryanv, I'm just curious... How's juk's memory usage in your case? Is your computer 
swapping while indexing?

Comment 4 bryanv 2003-09-21 01:05:25 UTC

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  Command
10838 bryan      9   0 66300  64m  15m S  0.0  6.4   0:10.86 juk

This is after just opening and running a few searches on the index I created
earlier. I will delete the cache and re-index a little later and get back to you
about swap. This system has 1 GiB RAM tho, and swap is on a 15K RPM U160 SCSI
disk   with an 8 MiB cache, so I don't really think thrashing is an issue.

Comment 5 Scott Wheeler 2003-09-22 22:00:58 UTC

The primary limitations here have been in KListView, which Yammi does not use.
There were some pretty major painting bottlenecks that slow things down on large
collections -- especially while searching.

I've done a lot of optimizations for this in KListView and performance has been quite
acceptable here even when profiling JuK with 12,000+ items. (I don't have this many
in my collection but created about 10,000 dummy files for testing.) I'm surprised that
you say that you're running KDE 3.1.4 because I backported the most significant of
these optimizations to the 3.1 branch (I thought) before 3.1.4 was tagged -- are you
sure you're running the real 3.1.4 and not a snapshot that Gentoo took at some point
before the release?

It's also of note that the cache doesn't affect performance at any time but startup and
even with really large collections (again, I was profiling with a little over 12,000 items)
JuK's startup time was under 10-ish seconds (much less before the splash screen
appears), which I consider to be acceptable for a collection of that size.

Now on to scanning: There are three major bottlenecks right now -- the first is the
same KListView painting issues that I mentioned above, which are partly fixed now,
though still painting a list as you insert into it is slow -- I plan to work on that. The
second major bottleneck is the KFileMetaInfo system, which is freakishly slow. I've
got an audio-only replacement (though I'll be rewriting the existing KFMI plugins to use
this replacement as well, but the architecture is slow) already in KDE CVS and on the
3.2 feature plan. These two should eventually bring things to the point of things being
limited by the still largest factor -- disk access, which is dificult to get around. ;-) (But
my tag reading solution in terms of speed, well, it probably won't be the bottleneck
anymore -- it can read 2000 tags in about 1 second of CPU time. :-) )

Comment 6 Scott Wheeler 2004-01-23 04:15:44 UTC

Since this was reported JuK has moved to using TagLib exclusively at this point which is much faster than the old KFileMetaInfo interface.  I can now index my 3500 files in under about 45 seconds.  It's also worth noting that I compared this to some of the popular software on other platforms (iTunes, RealOne) over Christmas -- and they both take near to half an hour just to index 3000 files.

I've also done some searches now with as many as 15000 items and the responsiveness is a fraction of a second (roughly a tenth of a second as nearly as I can measure it) which is probably the best that things are going to get given the limitations imposed by the base classes.

Anyway, marking this one as closed for the moment since I've squeezed just about everywhere that I can.  :-)