Version: 1.0.0-beta6 (rev.: 1041332) (using 4.3.2 (KDE 4.3.2), 4.3.2-4.fc11 Fedora) Compiler: gcc OS: Linux (i686) release 2.6.30.9-90.fc11.i686.PAE Scrolling when under-exposure indicator is on is quite slow.
SVN commit 1044633 by aclemens: Fix speed again in pureColorMask(), do not use DColor at all, this is way too slow. Use direct data array access. CCBUG:213001 M +37 -32 dimg.cpp WebSVN link: http://websvn.kde.org/?view=rev&revision=1044633
Still displaying operations in digiKam are very slow, and the bottleneck in most cases is DImgScale::dimgScaleAARGB(). As far as I can see with OProfile (a much better profiler then valgrind + callgrind, no system slowdowns and a system wide profiling), pureColorMask() can not be optimized anymore. So we need to make dimgScaleAARGB() faster (but I don't understand this algorithm right now). samples % image name app name symbol name 56319 31.2541 no-vmlinux no-vmlinux /no-vmlinux 32226 17.8838 libdigikamcore.so.1.0.0 libdigikamcore.so.1.0.0 Digikam::DImgScale::dimgScaleAARGB(Digikam::DImgScale::__dimg_scale_info*, unsigned int*, int, int, int, int, int, int, int, int) 25042 13.8970 libQtGui.so.4.5.3 libQtGui.so.4.5.3 /usr/lib/libQtGui.so.4.5.3 15489 8.5956 nvidia_drv.so nvidia_drv.so /usr/lib/xorg/modules/drivers/nvidia_drv.so 13449 7.4635 libdigikamcore.so.1.0.0 libdigikamcore.so.1.0.0 Digikam::DImg::pureColorMask(Digikam::ExposureSettingsContainer*) 8897 4.9374 libdigikamcore.so.1.0.0 libdigikamcore.so.1.0.0 Digikam::DImg::bitBlt(unsigned char const*, unsigned char*, int, int, int, int, int, int, unsigned int, unsigned int, unsigned int, unsigned int, bool, int, int) 5542 3.0755 libGLcore.so.190.42 libGLcore.so.190.42 /usr/lib/libGLcore.so.190.42 3404 1.8890 Xorg Xorg /usr/bin/Xorg 2591 1.4379 anon (tgid:16625 range:0xb3e9f000-0xb473f000) eclipse anon (tgid:16625 range:0xb3e9f000-0xb473f000) 1547 0.8585 libQtCore.so.4.5.3 libQtCore.so.4.5.3 /usr/lib/libQtCore.so.4.5.3 949 0.5266 libc-2.10.1.so libc-2.10.1.so _int_malloc 633 0.3513 libpthread-2.10.1.so libpthread-2.10.1.so pthread_mutex_lock 554 0.3074 libpthread-2.10.1.so libpthread-2.10.1.so __pthread_mutex_unlock_usercnt 529 0.2936 libc-2.10.1.so libc-2.10.1.so malloc 519 0.2880 oprofiled oprofiled /usr/bin/oprofiled 471 0.2614 libpixman-1.so.0.16.2 libpixman-1.so.0.16.2 pixman_rasterize_edges_accessors 397 0.2203 libc-2.10.1.so libc-2.10.1.so free 368 0.2042 [vdso] (tgid:9156 range:0xb7788000-0xb7789000) Xorg [vdso] (tgid:9156 range:0xb7788000-0xb7789000) 327 0.1815 libc-2.10.1.so libc-2.10.1.so strcmp 314 0.1743 librt-2.10.1.so librt-2.10.1.so clock_gettime 312 0.1731 libc-2.10.1.so libc-2.10.1.so _int_free 281 0.1559 libc-2.10.1.so libc-2.10.1.so memmove 276 0.1532 libc-2.10.1.so libc-2.10.1.so memcpy 264 0.1465 libc-2.10.1.so libc-2.10.1.so memset 209 0.1160 libglib-2.0.so.0.2200.2 libglib-2.0.so.0.2200.2 g_main_context_check 183 0.1016 libjvm.so libjvm.so SymbolTable::lookup(int, char const*, int, unsigned int) 172 0.0955 libdigikamcore.so.1.0.0 libdigikamcore.so.1.0.0 Digikam::DImgScale::dimgCalcApoints(int, int, int) 151 0.0838 libglib-2.0.so.0.2200.2 libglib-2.0.so.0.2200.2 g_main_context_prepare 146 0.0810 libc-2.10.1.so libc-2.10.1.so __i686.get_pc_thunk.bx 146 0.0810 libc-2.10.1.so libc-2.10.1.so malloc_consolidate 145 0.0805 libc-2.10.1.so libc-2.10.1.so pthread_mutex_lock 141 0.0782 libc-2.10.1.so libc-2.10.1.so pthread_mutex_unlock 133 0.0738 bash bash /bin/bash 124 0.0688 libglib-2.0.so.0.2200.2 libglib-2.0.so.0.2200.2 g_main_context_iterate 114 0.0633 libjvm.so libjvm.so instanceKlass::oop_oop_iterate_nv(oopDesc*, FastScanClosure*) 106 0.0588 [vdso] (tgid:9321 range:0xb773a000-0xb773b000) kwin [vdso] (tgid:9321 range:0xb773a000-0xb773b000) 95 0.0527 libdigikamcore.so.1.0.0 libdigikamcore.so.1.0.0 Digikam::DImgScale::dimgCalcYPoints(unsigned int*, int, int, int) 91 0.0505 libglib-2.0.so.0.2200.2 libglib-2.0.so.0.2200.2 g_main_context_dispatch 89 0.0494 libjvm.so libjvm.so jni_invoke_nonstatic(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*) 85 0.0472 libpthread-2.10.1.so libpthread-2.10.1.so __i686.get_pc_thunk.bx 84 0.0466 libjvm.so libjvm.so Runtime1::primitive_arraycopy(HeapWord*, HeapWord*, int) 76 0.0422 libc-2.10.1.so libc-2.10.1.so mbrtowc 76 0.0422 libc-2.10.1.so libc-2.10.1.so strlen 74 0.0411 libc-2.10.1.so libc-2.10.1.so strcspn 74 0.0411 libjvm.so libjvm.so DefNewGeneration::copy_to_survivor_space(oopDesc*)
Bartek, scrolling was never slow for me, did you mean "zooming in / out"?
yes, I meant zooming using scroll ;)
I dont understand the dimgscale.cpp code either. It originates from imlib and was ported to C++ by Daniel Duley (known as mosfet a very long time ago). At least two graphics experts involved here. For me, I dont expect to optimize anything there. Using a different algorithm would be an option. Some time ago a fast-scaling algorithm was suggested, but it had severe quality problems and was not ported to 16bit. So to resume, today I don't know of any algorithm offering the same quality, being available for 16bit, and significantly faster than Imlib scaling.
>I dont understand the dimgscale.cpp code either. It originates from imlib and >was ported to C++ by Daniel Duley (known as mosfet a very long time ago). At >least two graphics experts involved here. For me, I dont expect to optimize >anything there. Mosfet CC if he can help us a little bit... >Using a different algorithm would be an option. Some time ago a fast-scaling >algorithm was suggested, but it had severe quality problems and was not ported >to 16bit. To have tested this code, it lack region Neighborhood to scale up/down image. resize image has visible artefact. It's unsuitable. Current code from Mostfet has this feature. Note : this code exist also in Krita and Gwenview, if i remember... Gilles
We could of course save the pureColorMask as a member object and scale it accordingly, so it needs to be generated only once. Like we do in FreeRotation with the gray overlay now (it was generated before on every move of the selection widget, which is surely too slow). Although I can not see from the profiling if the combining of the two images (pureColorMask and image) is slow or the generating of the mask.
I found another speedup improvement with this commit : http://websvn.kde.org/trunk/extragear/graphics/digikam/utilities/imageeditor/canvas/dimginterface.cpp?r1=1067871&r2=1069591 please test again. Gilles Caulier
Still the difference is quit big :/ I can remember that I've seen in some other app that the indicator was blinking, so it must have had some separate thread. If the editor also have additional thread, it could be done there. I mean, that the user won't mind if the indicator is not visible during zooming and appears 500ms after last zoom.
quite ... ;)
Blinking ? Here on my double core CPU, it's very very fast. what do you use as computer ? Linux box ? Gilles Caulier
Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz on fedora I can see that the cpu usage on core is bigger when indicators on.
Bartek, yes sure, it more slow with exposure indicator switched on, but it still suitable here. Johannes, Perhaps we can use OpenMP there. Code relevant is in this method : http://lxr.kde.org/source/extragear/graphics/digikam/libs/dimg/dimg.cpp#1569 Do you think that speed performance can be improved with OpenMP ? Gilles Caulier
I think it is faster since the last commit, for me it is working ok now...
I also don't see any reason to parallelize this code. It's working fast enough now.
When I open 12MPix image with 8 bits per channel I can toggle indicators on and zooming via ctrl+scroll is fast. If I open the raw file of the same image zooming via ctrl+scroll is also fast if no exposure indicators are set. If I enable indicators and the full image is visible in editor window zooming is pretty slow but it gets fast when only a part of the image is shown. I notice that on a intel core2 duo. Jens
SVN commit 1070199 by cgilles: again, another speed-up improvements there. CCBUGS: 213001 M +2 -2 dimginterface.cpp WebSVN link: http://websvn.kde.org/?view=rev&revision=1070199
Gilles, last commit give me the last needed bit of improvement. Zooming is overall fast now.
No difference now. For me ok.
What do you mean exactly ? It's better now ? Can i close this file ? Gilles Caulier
I mean that it is much better now and there is practically no difference in speed when the indicators are on or off (both in editor and raw import tool). Definitely to close.