Bug 268212

Summary: Contiguous area selection (and fill tool) is TOO SLOW!
Product: [Applications] krita Reporter: animtim
Component: ToolsAssignee: Krita Bugs <krita-bugs-null>
Status: RESOLVED FIXED    
Severity: normal CC: halla, pentalis
Priority: NOR    
Version: git master (please specify the git hash!)   
Target Milestone: ---   
Platform: Unlisted Binaries   
OS: Linux   
Latest Commit: Version Fixed In:
Attachments: callgrind outpu=t

Description animtim 2011-03-11 12:04:38 UTC
Version:           svn trunk (using KDE 4.4.5) 
OS:                Linux

I've noticed this tool is very slow, so slow it becomes a pain to use, and often I find it's faster to select by hand a well closed area... Too bad!

I think this really needs some enhancement.

Reproducible: Always
Comment 1 Halla Rempt 2011-03-26 13:04:44 UTC
Right... Not sure what we can do here, but let's put these tools on the map for the next performance drive.
Comment 2 Sven Langkamp 2011-06-03 19:07:27 UTC
*** Bug 273859 has been marked as a duplicate of this bug. ***
Comment 3 Halla Rempt 2011-10-15 11:46:11 UTC
Git commit 5eed87e6aa717ad1ce9ce9093b333bad4680e6c0 by Boudewijn Rempt.
Committed on 15/10/2011 at 13:40.
Pushed by rempt into branch 'master'.

Do not pass a qbitarray of channelflags if they are all true

The composite ops optimize when an empty channelflags variable is passed:
all channels are then assumed to be turned on. This makes a difference
when doing a valgrind run, where before this patch filling an area
would spend ~20% of cpu time in testBit and after the patch only ~2%.

CCBUG:268212

M  +17   -6    krita/image/kis_async_merger.h

http://commits.kde.org/calligra/5eed87e6aa717ad1ce9ce9093b333bad4680e6c0
Comment 4 Halla Rempt 2011-10-15 11:47:32 UTC
Okay, this is the big, bad boy now:

    virtual quint8 difference(const quint8* src1, const quint8* src2) const {
        quint8 lab1[8], lab2[8];
        cmsCIELab labF1, labF2;

        if (this->opacityU8(src1) == OPACITY_TRANSPARENT_U8
            || this->opacityU8(src2) == OPACITY_TRANSPARENT_U8)
            return (this->opacityU8(src1) == this->opacityU8(src2) ? 0 : 255);
        Q_ASSERT(this->toLabA16Converter());
        this->toLabA16Converter()->transform(src1, lab1, 1),
        this->toLabA16Converter()->transform(src2, lab2, 1),
        cmsLabEncoded2Float(&labF1, (cmsUInt16Number *)lab1);
        cmsLabEncoded2Float(&labF2, (cmsUInt16Number *)lab2);
        qreal diff = cmsDeltaE(&labF1, &labF2);
        if (diff > 255)
            return 255;
        else
            return qint8(diff);
    }

And insanely expensive function which takes about 40% of cpu time of doing a fill, according to valgrind. We need a cheaper difference, specialized for the most common colorspaces.
Comment 5 Halla Rempt 2011-10-15 13:24:50 UTC
Git commit 551b0e25ed049db65ed36b815447072f544ee38c by Boudewijn Rempt.
Committed on 15/10/2011 at 15:14.
Pushed by rempt into branch 'master'.

Optimize the creation of a flood selection

Flood selection uses KoColorSpace::difference which is very, very
expensive. Implement a simple cache that works in the most common
situation: 8 bit rgba (or any other colorspace where a pixel fits
exactly in a quint32).

It's still too slow, though.

CCBUG: 268212

M  +46   -8    krita/image/kis_fill_painter.cc

http://commits.kde.org/calligra/551b0e25ed049db65ed36b815447072f544ee38c
Comment 6 Halla Rempt 2012-02-29 12:47:57 UTC
hm, this callgrind output shows that a) the redisplay for this 4000x4000 images a lot of the time, 40% is spent in the redisplay; the ~30% doing the actual floodfill.

The callgrind was generated by started without instrumentation, painting a loop, starting instrumentation, filling, wait until fill had shown up, stopping instrumentation.
Comment 7 Halla Rempt 2012-02-29 12:48:26 UTC
Created attachment 69179 [details]
callgrind outpu=t
Comment 8 Halla Rempt 2012-05-08 09:29:46 UTC
After the ng iterator patch, filling is now pretty fast, I was amazed actually. On this system, there's no difference between gimp 2.8 and krita anymore, though photoshop 7 still is much faster. Animtim, can you check whether the performance is now acceptable for you?

I tested with a 3000x3000 image and a loop that enclosed most of the image.
Comment 9 animtim 2012-05-08 10:50:57 UTC
Yep, I confirm it's much better already now, good work!
(can close this bug I think)
Comment 10 Halla Rempt 2012-05-08 10:52:13 UTC
great!