Summary: | Data DVD project - Drag & drop of directories with many files takes a very long time | ||
---|---|---|---|
Product: | [Applications] k3b | Reporter: | Manfred Schneider <manfred> |
Component: | Data Project | Assignee: | Sebastian Trueg <trueg> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | amantia, mail |
Priority: | NOR | ||
Version: | 0.10 | ||
Target Milestone: | --- | ||
Platform: | openSUSE | ||
OS: | Linux | ||
Latest Commit: | Version Fixed In: | ||
Sentry Crash Report: |
Description
Manfred Schneider
2004-02-29 19:52:27 UTC
I too have noticed this. The problem is swap. I guess K3B tries to keep the entire tree in RAM, becuase if I select a 4G partition for backup very soon the computer is using 400MB of swap. This is on a machine with 256MB of RAM, and I typically use zero to very little (certainly less than 100MB) of swap. Hi, thanks for the explanation. I am unsure if this is the cause of the problem. My PC has 1 GB RAM and 1 GB swap space. If such a large amount of physical and virtual memory is not enough for K3B then I would suggest to use other algorithms for adding many files to a K3B project (e.g. dividing the files in chunks, partitioning the work in parts etc.). When I use other DVD burning programs (on other platforms ;-) then I have not experienced such performance problems. Thanks for your efforts and the great K3B tool. Manfred I can confirm this bug for 0.12.17. The memory issues seems to be solved at least on my version, top reports RES 52m, SHR 16m and there is still free RAM on my machine. What I did: mkdir Test && cd Test && for i in `seq 1 70000`; do touch $i; done Add Test to DataDVD After 40 minutes of 100% CPU, I got bored and hit cancel. I checked the image and it had 64732 of the 70000 files added, so I got close. Clicking on the folder in the cd image takes a significant amount of time until the files are displayed, I'd say about 20 sec. For comparison it takes konqueror about 9 seconds to display the folder. this has been improved a little for k3b 1.0 but adding many files will always be slow since for each file an entry has to be created. True, the question is if this can be improved. For 70000 files 40 minutes on a fairly new computer (dual core 2.16 GHz). On the other hand, it is just 34 ms per file, but still... I am open to suggestions. If anyone has an idea (based on the code please) go ahead. Just confirmed this on 1.0pre2. How difficult is it to add a counter to determine how many files have been done already (or a percent done)? find | wc -l should not take so long and the user at least has feedback about what is happening. What needs to be done per file that it takes such a long time? On Sunday 24 September 2006 18:43, Christoph Burger-Scheidlin wrote:
> What needs to be done per file that it takes such a long time?
just think about it for a moment. You have to create one object for each file
and directory. Before that you have to stat the file. Then you have to create
view items for the new files and directories that are displayed.
As for the progress: I think the best idea would be to count the files while
already adding them. Maybe this also can be a jj?
*** Bug 112546 has been marked as a duplicate of this bug. *** svn version now shows the progress when adding urls to a data project. Sebastian what about "lazy" GUI (similar to lazy evaluation in programming languages). I think most of them goes for GUI. So how about creating in the left bottom pane not real tree, but fake one. When you click on the node _then_ k3b gets the structure from the disk and makes fake subtree again. This way I want to only add files (many! of them) and burn it, adding would be much faster. PS. I use 1.0 and 0.12.17, here I tested the latter. Adding 647263 files (2.8GB) took ~15 minutes, burning -- not finished. I stopped after two hours with 0% and I didn't notice the burner led even blinked. However I guess burning is the growisofs thing. no, that would make k3b A LOT less powerful. what would be affected by the lazy approach? Lazy GUI ("build GUI tree on demand")? I can't see why: * if user want to see each node -- no problem, total time remains unchanged * if user want to see only some nodes (or even none) -- here are the time savings It could be an option for "power safety" :-) As long I use k3b I always drag files and burn them. I didn't mix directories -- so for me it would be a big advantage. Most important reason: Way too much work for such a (in my opinion not very important) feature. And what about the project size? And file replacing? No, I won't do that. sorry. Sebastian, I respect your wontfix (it is ok) but technically -- would it make k3b crippled in some way (#13)? Apart for developing time :-) I am thinking -- when you click on the dir in file pane there is a yellow gear and here you already have "show on demand" solution. So maybe there is easy way to add this feature. The reason I think about it because I need to backup those tiny files, and in time, I will get them even more... you have to look at each file anyway to 1. get the size and 2. actually create the iso image. ad.1) it is fast ad.2) no. There is nothing about iso yet I am trying to comment out completely building GUI tree to estimate the time (beta1 unmodified -- 647263 files (2.8GB) 26 minutes) -- I am at k3bdatadoc.cpp. Still reading what is going on :-) Btw. sebastian, do you use some kind of IDE to develop k3b? Ok, but here some thoughts for now -- I am adding files, right? Now, there is progress bar in progress dialog, there are filenames in progress dialog, there is file counter in progress dialog, there is green volume bar in status bar and there is another file counter in status bar. And all this stuff is all the time animated. For progress bar -- all except the progress bar should be removed. For status bar animation should be stopped. Reason: if adding is fast, nobody even notice, if it is slow, somebody has to animate this and I guess it would be CPU ;-) so better not adding additional work. PS. For k3b 0.12.17 it is exactly 11 minutes, so I could say beta1 is a downgrade considering performance. Am I right, but this "k3b data project" tree file is also refreshed constantly while files are being added? Oh, free space (on disc) is recalculated too :-) I would love to speed this up without loosing functionality. But that is not easy (except for the current added file display). try the dcop interface for as-fast-as-possible adding of urls. It blocks the whole gui. #21 Ok, but this animation is not funcionality really. As I said -- few files, no difference for the user -- many files, major gain. #22 Ha, thanks, I learn a lot from this discussion :-) It is hard to tell exactly, but it is about 1.5-2 minutes. Such speed I like :-))) Btw. I would like to help with this file tree (bottom-left) but I get to k3bdiritem and k3bfileitem and I am a bit lost (and my scientific supervisor will kill me, that for sure) so I quit searching for now. However, brute-force but easy to do speed-up (instead of lazy GUI). Some hidden (or not) option to show that tree. If the tree is not visible -- do not build the GUI tree. Since I used it once, or twice in my whole life (and such backups are every-month task) for me it would be a big advantage. The top entries in bottom right view are sufficient. On Monday 04 December 2006 20:16, Maciej Pilichowski wrote: > #22 > Ha, thanks, I learn a lot from this discussion :-) It is hard to tell > exactly, but it is about 1.5-2 minutes. Such speed I like :-))) > > Btw. I would like to help with this file tree (bottom-left) but I get to > k3bdiritem and k3bfileitem and I am a bit lost (and my scientific > supervisor will kill me, that for sure) so I quit searching for now. > However, brute-force but easy to do speed-up (instead of lazy GUI). Some > hidden (or not) option to show that tree. If the tree is not visible -- do > not build the GUI tree. Since I used it once, or twice in my whole life > (and such backups are every-month task) for me it would be a big advantage. > The top entries in bottom right view are sufficient. actually I just got an idea and maybe I misunderstood the whole time: the lazy gui might be possible after all.... I reported bug #112546, a duplicate of this one. I made some test now using the svn version from today including some last minute performance fixes and the 0.12.17 version as it is shipped by SUSE. The test is the some sources (means lots of small files), which is - according to Konqueror - 352.5 MB, 60588 files and 15671 folders. The machine is an AMD 64 3200+, 1 GB RAM and a Seagate SATA 250GB with 16MB cache disk. 0.94svn version: 2:09m 0.12.17: 3:30m Konqueror calculating the size of the directory (thus stating every file for its size): 1:20m Timing is not correct to seconds level, as I had to confirm some dialogs in K3B, but it shoudn't really matter. Also chaches were not cleaned meantime, but the order was Konqueror-svn version-old version, so Konqueror should have been the slowest with cold cache. So obviously the current version is much faster, but it is not fast enough, because I tried to do the test with my full KDE checkout, and I aborted after 10 minutes (it was at 1.3GB with about 150000 files and 30000 subfolders, using 85%of my CPU and 22% of my memory) when the "add dialog" did not switch to the progress indicator yet. Konqueror reached counting that amount of files in 2:30 minutes, calculating the whole directory ( 2.0 GB, 338713 files and 60559 folders) in 11:45 minutes. Needing the same amount of time would be what I say reasonable. Still a lot of time, but I cannot expect to do it fast as Konqueror doesn't build a new tree or something like that. Of course as it was suggested, tree building could be delayed. I had some suggestions in the bug report, but I haven't checked how much of it was actually implemented. The basic idea was to do as little KNetAccess calls as possible, cache the results and pass it over when needed. If it is possible to use direct system calls instead of KNetAccess (for local files should be possible), use it. But the final say about where the time is spent can be obtained only by valgridning (with cachegrind) k3b. comparisions with konqueror don't make much sense since k3b has to do a lot more than just stating each file. and k3b does not use KNetAccess at all anymore. It is all local. I don't think I can improve the GUI updating anymore (except less updates to gain a few seconds but I don't like that). But I will search for additional bottlenecks in the implementation. Some values from cachegrind: - unfortunately it shows that most of the time is spent in an unknown state (possibly some tweaking is needed to get better information): 64.81% - 13.50% is spent in k3bdiritem.cpp, mostly in the isRemoveable() call (13%). The problematic line is inside the for loop: rem = rem && it.current()->isRemoveable(); - 8.53% is spent in K3bDataItem::isRemoveable() (called from the above one, I guess) If the 64.81%is broke down in smaller pieces, 12.69% goes to an unknown function (in crti.S, crtn.S files), while 12.08% goes to QGListIterator::operator++. Another function call to look for if it can be omptimized (using const iterators for example? Reducing the number of iterations over lists?). Another 3.02% is spend constructin QGListIterator objects. The rest are small values, the only other noticable one (1.75%) is for QPtrListIterator<K3bDataItem>::current(). Ok, so here is some food for thought. ;-) On Wednesday 06 December 2006 19:18, Sebastian Trueg wrote: > comparisions with konqueror don't make much sense since k3b has to do > a lot more than just stating each file. Sure, I said that konqueror does much less. ;-) > and k3b does not use KNetAccess at all anymore. It is all local. I > don't think I can improve the GUI updating anymore (except less > updates to gain a few seconds but I don't like that). But I will > search for additional bottlenecks in the implementation. Great to hear that KNetAccess is not used anymore! See my other comment with cachegrind values. The latest svn is speed deamon -- tested on the same data -- 2 minutes. So it is no difference really between nice GUI adding and via dcop. Although this bug is marked as FIXED, I'd like to add a comment. As the original post states, it takes some time to add a directory with many files. According to my observation, this is still true for both 0.12.17 and 1.0.4. I did some time measurement: On my system, it takes about 5 sec to add 100 files, 32 sec to add 500, but already over 4 minutes to add 2000 files. This suggests that there is (at least) quadratic runtime required to add files. Although I'm not familiar with the code, some digging revealed that K3bDataUrlAddingDialog::slotAddUrls() (which is called once per file to be added) calls K3bDirItem::find(..) which in its turn checks all previously inserted files. This results in a time complexity of O(n^2) for n many files. Using some tree (e.g. using hashes on filenames) might reduce the time to O(n log n). There may be more similar calls resulting in high running times, so I suggest a code review for the "add files" case. |