Bug 76438

Summary: Data DVD project - Drag & drop of directories with many files takes a very long time
Product: [Applications] k3b Reporter: Manfred Schneider <manfred>
Component: Data ProjectAssignee: Sebastian Trueg <trueg>
Status: RESOLVED FIXED    
Severity: normal CC: amantia, mail
Priority: NOR    
Version: 0.10   
Target Milestone: ---   
Platform: openSUSE   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Manfred Schneider 2004-02-29 19:52:27 UTC
Version:           0.10 (using KDE 3.1.4)
Installed from:    SuSE
Compiler:          gcc version 3.3.1 (SuSE Linux)
OS:          Linux (i686) release 2.4.21-192-athlon

Hi,

first of all: thanks for this great and very helpful tool :-)

I would like to backup 3 directories with about 70,000 files on a DVD+R. After dragging the 3 selected directories and dropping it on the DVD project the CPU consumption increases to 100% and stays at this level for about 10-30 minutes.

When I lateron press the burn button to start burning it takes again a long time (about 10 minutes and again 100% CPU consumption) before the burning actually starts.

It seems to me that such a large number of files "confuses" K3B. Maybe there are problems with searching, sorting, or data access algorithms.

I would appreciate it very much you could reduce this time and resource consumption.

Thanks.

Manfred
Comment 1 is 2004-06-30 00:45:25 UTC
I too have noticed this. The problem is swap. I guess K3B tries to keep the entire tree in RAM, becuase if I select a 4G partition for backup very soon the computer is using 400MB of swap. This is on a machine with 256MB of RAM, and I typically use zero to very little (certainly less than 100MB) of swap.
Comment 2 Manfred Schneider 2004-06-30 08:13:55 UTC
Hi,

thanks for the explanation. I am unsure if this is the cause of the problem. My PC has 1 GB RAM and 1 GB swap space.

If such a large amount of physical and virtual memory is not enough for K3B then I would suggest to use other algorithms for adding many files to a K3B project (e.g. dividing the files in chunks, partitioning the work in parts etc.).

When I use other DVD burning programs (on other platforms ;-) then I have not experienced such performance problems.

Thanks for your efforts and the great K3B tool.

Manfred
Comment 3 Christoph Burger-Scheidlin 2006-09-05 17:52:39 UTC
I can confirm this bug for 0.12.17.

The memory issues seems to be solved at least on my version, top reports RES 
52m, SHR 16m and there is still free RAM on my machine.

What I did:
mkdir Test && cd Test && for i in `seq 1 70000`; do touch $i; done
Add Test to DataDVD
After 40 minutes of 100% CPU, I got bored and hit cancel.

I checked the image and it had 64732 of the 70000 files added, so I got close. 
Clicking on the folder in the cd image takes a significant amount of time 
until the files are displayed, I'd say about 20 sec. For comparison it takes 
konqueror about 9 seconds to display the folder.
Comment 4 Sebastian Trueg 2006-09-06 12:07:44 UTC
this has been improved a little for k3b 1.0 but adding many files will always be slow since for each file an entry has to be created.
Comment 5 Christoph Burger-Scheidlin 2006-09-10 03:09:28 UTC
True, the question is if this can be improved. For 70000 files 40 minutes on a 
fairly new computer (dual core 2.16 GHz).  On the other hand, it is just 34 
ms per file, but still...
Comment 6 Sebastian Trueg 2006-09-10 16:16:11 UTC
I am open to suggestions. If anyone has an idea (based on the code please) go ahead.
Comment 7 Christoph Burger-Scheidlin 2006-09-24 18:43:58 UTC
Just confirmed this on 1.0pre2. How difficult is it to add a counter to determine how many files have been done already (or a percent done)? find | wc -l should not take so long and the user at least has feedback about what is happening.

What needs to be done per file that it takes such a long time?
Comment 8 Sebastian Trueg 2006-09-25 10:24:15 UTC
On Sunday 24 September 2006 18:43, Christoph Burger-Scheidlin wrote:
> What needs to be done per file that it takes such a long time?


just think about it for a moment. You have to create one object for each file 
and directory. Before that you have to stat the file. Then you have to create 
view items for the new files and directories that are displayed.

As for the progress: I think the best idea would be to count the files while 
already adding them. Maybe this also can be a jj?
Comment 9 Christoph Burger-Scheidlin 2006-10-08 17:21:33 UTC
*** Bug 112546 has been marked as a duplicate of this bug. ***
Comment 10 Sebastian Trueg 2006-11-20 15:17:50 UTC
svn version now shows the progress when adding urls to a data project.
Comment 11 Maciej Pilichowski 2006-12-02 13:50:55 UTC
Sebastian what about "lazy" GUI (similar to lazy evaluation in programming languages).

I think most of them goes for GUI. So how about creating in the left bottom pane not real tree, but fake one. When you click on the node _then_ k3b gets the structure from the disk and makes fake subtree again.

This way I want to only add files (many! of them) and burn it, adding would be much faster.
Comment 12 Maciej Pilichowski 2006-12-02 16:02:17 UTC
PS. I use 1.0 and 0.12.17, here I tested the latter. Adding 647263 files (2.8GB) took ~15 minutes, burning -- not finished. I stopped after two hours with 0% and I didn't notice the burner led even blinked. However I guess burning is the growisofs thing.
Comment 13 Sebastian Trueg 2006-12-02 19:36:46 UTC
no, that would make k3b A LOT less powerful.
Comment 14 Christoph Burger-Scheidlin 2006-12-02 20:25:54 UTC
what would be affected by the lazy approach?
Comment 15 Maciej Pilichowski 2006-12-02 20:26:47 UTC
Lazy GUI ("build GUI tree on demand")? I can't see why:
* if user want to see each node -- no problem, total time remains unchanged
* if user want to see only some nodes (or even none) -- here are the time savings

It could be an option for "power safety" :-) As long I use k3b I always drag files and burn them. I didn't mix directories -- so for me it would be a big advantage.
Comment 16 Sebastian Trueg 2006-12-03 16:02:15 UTC
Most important reason: Way too much work for such a (in my opinion not very important) feature.
And what about the project size? And file replacing? No, I won't do that. sorry.
Comment 17 Maciej Pilichowski 2006-12-03 16:38:37 UTC
Sebastian, I respect your wontfix (it is ok) but technically -- would it make k3b crippled in some way (#13)? Apart for developing time :-)
I am thinking -- when you click on the dir in file pane there is a yellow gear and here you already have "show on demand" solution. So maybe there is easy way to add this feature. The reason I think about it because I need to backup those tiny files, and in time, I will get them even more...

Comment 18 Sebastian Trueg 2006-12-03 19:41:03 UTC
you have to look at each file anyway to 1. get the size and 2. actually create 
the iso image.
Comment 19 Maciej Pilichowski 2006-12-04 14:45:29 UTC
ad.1) it is fast
ad.2) no. There is nothing about iso yet

I am trying to comment out completely building GUI tree to estimate the time (beta1 unmodified -- 647263 files (2.8GB) 26 minutes)  -- I am at k3bdatadoc.cpp. Still reading what is going on :-) Btw. sebastian, do you use some kind of IDE to develop k3b?  

Ok, but here some thoughts for now -- I am adding files, right? Now, there is progress bar in progress dialog, there are filenames in progress dialog, there is file counter in progress dialog, there is green volume bar in status bar and there is another file counter in status bar. And all this stuff is all the time animated.
For progress bar -- all except the progress bar should be removed. For status bar animation should be stopped. 
Reason: if adding is fast, nobody even notice, if it is slow, somebody has to animate this and I guess it would be CPU ;-) so better not adding additional work.
Comment 20 Maciej Pilichowski 2006-12-04 17:22:13 UTC
PS. For k3b 0.12.17 it is exactly 11 minutes, so I could say beta1 is a downgrade considering performance. Am I right, but this "k3b data project" tree file is also refreshed constantly while files are being added? Oh, free space (on disc) is recalculated too :-)
Comment 21 Sebastian Trueg 2006-12-04 17:48:39 UTC
I would love to speed this up without loosing functionality. But that is not easy (except for the current added file display).
Comment 22 Sebastian Trueg 2006-12-04 17:58:20 UTC
try the dcop interface for as-fast-as-possible adding of urls. It blocks the 
whole gui.
Comment 23 Maciej Pilichowski 2006-12-04 20:16:51 UTC
#21
Ok, but this animation is not funcionality really. As I said -- few files, no difference for the user -- many files, major gain.

#22
Ha, thanks, I learn a lot from this discussion :-) It is hard to tell exactly, but it is about 1.5-2 minutes. Such speed I like :-)))

Btw. I would like to help with this file tree (bottom-left) but I get to k3bdiritem and k3bfileitem and I am a bit lost (and my scientific supervisor will kill me, that for sure) so I quit searching for now. However, brute-force but easy to do speed-up (instead of lazy GUI). Some hidden (or not) option to show that tree. If the tree is not visible -- do not build the GUI tree. Since I used it once, or twice in my whole life (and such backups are every-month task) for me it would be a big advantage. The top entries in bottom right view are sufficient.
Comment 24 Sebastian Trueg 2006-12-04 22:08:58 UTC
On Monday 04 December 2006 20:16, Maciej Pilichowski wrote:
> #22
> Ha, thanks, I learn a lot from this discussion :-) It is hard to tell
> exactly, but it is about 1.5-2 minutes. Such speed I like :-)))
>
> Btw. I would like to help with this file tree (bottom-left) but I get to
> k3bdiritem and k3bfileitem and I am a bit lost (and my scientific
> supervisor will kill me, that for sure) so I quit searching for now.
> However, brute-force but easy to do speed-up (instead of lazy GUI). Some
> hidden (or not) option to show that tree. If the tree is not visible -- do
> not build the GUI tree. Since I used it once, or twice in my whole life
> (and such backups are every-month task) for me it would be a big advantage.
> The top entries in bottom right view are sufficient.


actually I just got an idea and maybe I misunderstood the whole time: the lazy 
gui might be possible after all....
Comment 25 András Manţia 2006-12-06 17:51:53 UTC
I reported bug #112546, a duplicate of this one. I made some test now 
using the svn version from today including some last minute performance 
fixes and the 0.12.17 version as it is shipped by SUSE.

The test is the some sources (means lots of small files), which is - 
according to Konqueror - 352.5 MB, 60588 files and 15671 folders.
The machine is an AMD 64 3200+, 1 GB RAM and a Seagate SATA 250GB with 
16MB cache disk.

0.94svn version: 2:09m
0.12.17: 3:30m
Konqueror calculating the size of the directory (thus stating every file 
for its size): 1:20m

Timing is not correct to seconds level, as I had to confirm some dialogs 
in K3B, but  it shoudn't really matter. Also chaches were not cleaned 
meantime, but the order was Konqueror-svn version-old version, so 
Konqueror should have been the slowest with cold cache.

So obviously the current version is much faster, but it is not fast 
enough, because I tried to do the test with my full KDE checkout, and I 
aborted after 10 minutes (it was at 1.3GB with about 150000 files and 
30000 subfolders, using 85%of my CPU and 22% of my memory) when 
the "add dialog" did not switch to the progress indicator yet. 
Konqueror reached counting that amount of files in 2:30 minutes, 
calculating the whole directory ( 2.0 GB, 338713 files and 60559 
folders) in 11:45 minutes.

Needing the same amount of time would be what I say reasonable. Still a 
lot of time, but I cannot expect to do it fast as Konqueror doesn't 
build a new tree or something like that. Of course as it was suggested, 
tree building could be delayed.

I had some suggestions in the bug report, but I haven't checked how much 
of it was actually implemented. The basic idea was to do as little 
KNetAccess calls as possible, cache the results and pass it over when 
needed. If it is possible to use direct system calls instead of 
KNetAccess (for local files should be possible), use it.

But the final say about where the time is spent can be obtained only by 
valgridning (with cachegrind) k3b.
Comment 26 Sebastian Trueg 2006-12-06 18:18:01 UTC
comparisions with konqueror don't make much sense since k3b has to do a lot 
more than just stating each file.

and k3b does not use KNetAccess at all anymore. It is all local. I don't think 
I can improve the GUI updating anymore (except less updates to gain a few 
seconds but I don't like that). But I will search for additional bottlenecks 
in the implementation.
Comment 27 András Manţia 2006-12-06 18:22:47 UTC
Some values from cachegrind:

- unfortunately it shows that most of the time is spent in an unknown 
state (possibly some tweaking is needed to get better information): 
64.81%
- 13.50% is spent in k3bdiritem.cpp, mostly in the isRemoveable() call 
(13%). The problematic line is inside the for loop:
 rem = rem && it.current()->isRemoveable();
- 8.53% is spent in K3bDataItem::isRemoveable() (called from the above 
one, I guess)

If the 64.81%is broke down in smaller pieces, 12.69% goes to an unknown 
function (in crti.S, crtn.S files), while 12.08% goes to 
QGListIterator::operator++. Another function call to look for if it can 
be omptimized (using const iterators for example? Reducing the number 
of iterations over lists?). Another 3.02% is spend constructin 
QGListIterator objects.

The rest are small values, the only other noticable one (1.75%) is for 
QPtrListIterator<K3bDataItem>::current().

Ok, so here is some food for thought. ;-)
Comment 28 András Manţia 2006-12-06 18:23:43 UTC
On Wednesday 06 December 2006 19:18, Sebastian Trueg wrote:
> comparisions with konqueror don't make much sense since k3b has to do
> a lot more than just stating each file.


Sure, I said that konqueror does much less. ;-)

> and k3b does not use KNetAccess at all anymore. It is all local. I
> don't think I can improve the GUI updating anymore (except less
> updates to gain a few seconds but I don't like that). But I will
> search for additional bottlenecks in the implementation.


Great to hear that KNetAccess is not used anymore!

See my other comment with cachegrind values.
Comment 29 Maciej Pilichowski 2006-12-07 10:31:11 UTC
The latest svn is speed deamon -- tested on the same data -- 2 minutes. So it is no difference really between nice GUI adding and via dcop. 
Comment 30 Thomas Fischer 2007-12-15 16:22:37 UTC
Although this bug is marked as FIXED, I'd like to add a comment.
As the original post states, it takes some time to add a directory with many files. According to my observation, this is still true for both 0.12.17 and 1.0.4.

I did some time measurement: On my system, it takes about 5 sec to add 100 files, 32 sec to add 500, but already over 4 minutes to add 2000 files.
This suggests that there is (at least) quadratic runtime required to add files.

Although I'm not familiar with the code, some digging revealed that K3bDataUrlAddingDialog::slotAddUrls() (which is called once per file to be added) calls K3bDirItem::find(..) which in its turn checks all previously inserted files. This results in a time complexity of O(n^2) for n many files.
Using some tree (e.g. using hashes on filenames) might reduce the time to O(n log n).

There may be more similar calls resulting in high running times, so I suggest a code review for the "add files" case.