Bug 400604

Summary: excessive stat() syscalls
Product: [Applications] digikam Reporter: Johannes Berg <johannes>
Component: DImg-CoreAssignee: Digikam Developers <digikam-bugs-null>
Status: CLOSED UPSTREAM    
Severity: normal CC: caulier.gilles
Priority: NOR    
Version: 5.9.0   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In: 7.5.0
Sentry Crash Report:
Attachments: (postprocessed) strace -e stat log

Description Johannes Berg 2018-11-02 23:17:29 UTC
Created attachment 116057 [details]
(postprocessed) strace -e stat log

SUMMARY


STEPS TO REPRODUCE
1. add an album containing PNG files

OBSERVED RESULT
*far* too many stat() calls, to the point where this is becomes the bottleneck (though admittedly that might in part also be because this is on NFS for me)

EXPECTED RESULT
Not limited by stat() calls as the result is not expected to change within milliseconds.

SOFTWARE VERSIONS
(available in About System)
KDE Plasma Version: unknown (mostly gnome system)
KDE Frameworks Version: unknown (mostly gnome system)
Qt Version: unknown (mostly gnome system)

ADDITIONAL INFORMATION

I suspect that somehow there's a read() routine in or for the PNG decompressor that does a stat() call for each 8k block being read. When doing the strace fully without limiting to stat() [which I did in the attached file] I also see the following pattern (over and over again):

stat("/path/to/picture.png", {st_mode=S_IFREG|0644, st_size=112550827, ...}) = 0
stat("/path/to/picture.png", {st_mode=S_IFREG|0644, st_size=112550827, ...}) = 0
lseek(36, 29933568, SEEK_SET)           = 29933568
lseek(36, 29958144, SEEK_SET)           = 29958144
read(36, "<snip>"..., 8192) = 8192
stat("/path/to/picture.png", {st_mode=S_IFREG|0644, st_size=112550827, ...}) = 0
stat("/path/to/picture.png", {st_mode=S_IFREG|0644, st_size=112550827, ...}) = 0
lseek(36, 29966336, SEEK_SET)           = 29966336
lseek(36, 29990912, SEEK_SET)           = 29990912
read(36, "<snip>"..., 8192) = 8192

for ever 8k in the file.
Comment 1 Johannes Berg 2018-11-02 23:22:35 UTC
I said every 8k, and I also said >>7000 stat calls - and indeed (some of) those PNG files are in fact >100MB. That still means something is off in my calculations (every 8k for >100MB should be ~13k stat calls), but still.

I also see something similar with TIF files, though not nearly as many calls, so perhaps it is per PNG chunk for some reason, and should be assigned to FilesIO-PNG instead.
Comment 2 caulier.gilles 2018-11-03 05:07:21 UTC
The right question is Which sub component (library) process like this with PNG chuck

2 possibilities : Exiv2 or libpng.

In digiKam, we use libpng to read or save PNG image. that all. This must be safe.

I suspect Exiv2 metadata processing while scanning items or writing info in files.

Ideally, it will be very instructive to try to reproduce this kind of unit operation with exiv2 and look how low level system API are called to see if bottleneck appear.

Note : for tiff image it's the same case. libtiff is used to read or write image only by digiKam. Exiv2 process metadata extraction or update and can be system calls consuming.

Gilles Caulier
Comment 3 Johannes Berg 2018-11-03 06:54:25 UTC
(In reply to caulier.gilles from comment #2)
> The right question is Which sub component (library) process like this with
> PNG chuck
> 
> 2 possibilities : Exiv2 or libpng.
> 
> In digiKam, we use libpng to read or save PNG image. that all. This must be
> safe.

I figured not necessarily, since (I think) libpng supports going through the application's read() methods, and I figured perhaps you were using something like that.

> I suspect Exiv2 metadata processing while scanning items or writing info in
> files.

You're right:

$ strace -e stat exiv2 /path/to/picture.png 2>&1|grep stat|wc -l
7265

Any idea where to find the exiv2 bug tracker?
Comment 4 Johannes Berg 2018-11-03 06:59:14 UTC
> Any idea where to find the exiv2 bug tracker?

Never mind, I found a link on the homepage.
Comment 5 Johannes Berg 2018-11-03 07:15:40 UTC
http://dev.exiv2.org/issues/1374
Comment 6 caulier.gilles 2018-11-03 07:54:29 UTC
Johannes,

Since Exiv2 is migrated to github, the use this bugzilla in priority :

https://github.com/Exiv2/exiv2/issues

I know that Exiv2 migrate step by step all older redmine issue to guthub and it's not yet complete. This is why older way to report an issue to redmine exits yet.

So i recommend to use github instead redmine.

Best

Gilles Caulier
Comment 7 Johannes Berg 2018-11-03 08:10:53 UTC
(In reply to caulier.gilles from comment #6)
> Johannes,
> 
> Since Exiv2 is migrated to github, the use this bugzilla in priority :
> 
> https://github.com/Exiv2/exiv2/issues

Oh. *sigh*, you'd think they'd link the preferred one on their homepage ...

https://github.com/Exiv2/exiv2/issues/515
Comment 8 caulier.gilles 2018-11-03 10:43:09 UTC
Let's Exiv2 team manage your entry (:=)))... Robin know well the right place to post an issue and if tit need to be moved.

Gilles Caulier