Bug 152877

Summary: thumbnails: URI does not follow Thumbnail Managing Standard
Product: [Applications] digikam Reporter: Heiko Schröder <heiko.schroeder>
Component: Thumbs-ImageAssignee: Digikam Developers <digikam-bugs-null>
Status: RESOLVED FIXED    
Severity: normal    
Priority: HI    
Version: 0.9.2   
Target Milestone: ---   
Platform: openSUSE   
OS: Linux   
Latest Commit: Version Fixed In: 1.0.0
Sentry Crash Report:

Description Heiko Schröder 2007-11-25 15:44:04 UTC
Version:           0.9.2 (using KDE KDE 3.5.4)
Installed from:    SuSE RPMs
OS:                Linux

Characters in file or directory names are converted to Latin-1 by digikam. The resulting names are then used to form the Thumb::URI tag for thumbnails.

According to the Thumbnail Managing Standard the URI should be formed using the octet sequence that is actually used by the filesystem (in my case UTF-8) so that it can be used to access the file.

According to RFC2396 (which is referenced by the TMS) an URI may only contain US-ASCII characters, octets that are not part of the US-ASCII set have to be escaped using the "%<hex>" syntax. digikam uses the unescaped Latin-1 octets instead.

As other programs (e.g. konqueror) follow the standard, two thumbnail files may be generated for the same file using different URIs.

Addtionally, the re-encoding may fail for characters that cannot be represented in this character set, leading to wrong or incomplete URIs.
Comment 1 Heiko Schröder 2007-11-25 15:45:49 UTC
Accidently selected the wrong KDE version: It's 3.5.7 (using openSUSE 10.3).
Comment 2 Arnd Baecker 2007-11-26 08:44:28 UTC
Hi Heiko,

thanks for the report, I set the priority to high.
The places I found, where the actual path to the thumbnails is constructed, are

I think that this is an important issue.digikam/pixmapmanager.cpp:    
  uri = md5.hexDigest();
kioslave/digikamthumbnail.cpp:   
   thumbPath += QFile::encodeName( md5.hexDigest() ) + ".png";
libs/thumbbar/thumbbar.cpp:        
   uri = md5.hexDigest();
utilities/batch/batchthumbsgenerator.cpp:        
   uri = md5.hexDigest();
Are these all?

The code usually looks like
    QString uri = "file://" + QDir::cleanDirPath(url.path());
    KMD5 md5(QFile::encodeName(uri));
    uri = md5.hexDigest();

    QString smallThumbPath = d->thumbCacheDir + "normal/" + uri + ".png";

So the encodeName should be replaced, but I am not familiar with the hex
encoding. One way might be something like
  QString hex;
  hex.sprintf("%%%02X", uri);
(I am not sure about the first argument of sprintf ...)
Heiko, do you maybe have the pointer to the source code which konquerer uses?
Comment 3 Heiko Schröder 2007-11-26 10:13:48 UTC
Hi,

I must admit, I didn't even look at the code...

I wrote a perl script to cleanup my .thumbnails directory and noticed that some URIs contained Latin-1 characters while my filesystems are in UTF-8. For the same file there ususally was another entry using the hex-encoded URI. Checking the Software tag revealed the producers (digikam vs. konqueror).

For the encoding:
I'm afraid I can't help you much with the real code (I am not familiar with Qt), but generally: You have to replace all characters outside the US-ASCII range with the hex code (using capital letters) prefixed by "%". So your format string looks right, but of course it has to be applied only to those characters that are outside US-ASCII set (and before converting the path to Latin-1).

And it's not just MD5 code for the thumbnail that is affected. The Thumb::URI tag written to the thumbnail file has to be encoded correctly, too.
Comment 4 caulier.gilles 2008-12-04 21:01:49 UTC
Heiko,

What's news about this report ? It's still valid using digiKam 0.9.4 ?

Gilles Caulier
Comment 5 Heiko Schröder 2008-12-05 07:56:25 UTC
I'm afraid, the problem's still there. For example. look at the following thumbnails (Software is the "Software" tag, URI is "Thumb::URI"):

/home/heiko/.thumbnails/normal/0f2371678317882e56dcc233226617de.png: Software=Digikam Thumbnail Generator, URI=file:///home/media/photos/2008_K�lner_Zoo/dsc01908.jpg, mtime=1216725031

/home/heiko/.thumbnails/normal/190601fd9a42077eb8b170db6caf3704.png: Software=KDE Thumbnail Generator, URI=file:///home/media/photos/2008_K%C3%B6lner_Zoo/dsc01908.jpg, mtime=1216725031

Digikam uses Latin-1 to encode the german "ö" while KDE (correctly) uses "%xx-encoded" UTF-8.
Comment 6 caulier.gilles 2008-12-05 09:02:39 UTC
I found the problem from digiKam thumb creator. Look this code :

http://websvn.kde.org/branches/extragear/kde3/graphics/digikam/kioslave/digikamthumbnail.cpp?view=markup

At line 132, you can see:

QString uri = "file://" + QDir::cleanDirPath(url.path(-1));

uri is set as embeded text to png file with line 197:

img.setText(QString("Thumb::URI").latin1(), 0, uri);

Sound like the problem is at line 132 with QDir::cleanDirPath() method.

Now compare with KDE KIO thumb creator:

http://websvn.kde.org/branches/KDE/3.5/kdelibs/kio/kio/previewjob.cpp?revision=496090&view=markup

at line 505, url is recorded to png file as digiKam:

thumb.setText("Thumb::URI", 0, d->origName);

and at line 384, method statResultThumbnail() do not use QDir::cleanDirPath()

Andi, Marcel, your viewpoints ?

Note : this report is very important because if we fix it, it will speed up thumbnails rendering with non-latin file paths.

Gilles




Comment 7 caulier.gilles 2009-05-24 13:21:31 UTC
I update this file with KDE4 code to hack :

Gwenview thumb generator use directly a QString as well to record file URi in PNG text chunck: 

http://lxr.kde.org/source/KDE/kdegraphics/gwenview/lib/thumbnailloadjob.cpp#225

I'm afraid, KDE thumbnail loader do not set URi like this :

http://lxr.kde.org/source/KDE/kdebase/runtime/kioslave/thumbnail/thumbnail.cpp

Gilles Caulier
Comment 8 caulier.gilles 2009-05-24 13:31:04 UTC
Heiko,

For me code from digiKam 0.10.0 (KDE4) do not save text in PNG with latin-1 conversion :

http://lxr.kde.org/source/extragear/graphics/digikam/libs/threadimageio/thumbnailcreator.cpp#247

Can you try again ?

Gilles Caulier
Comment 9 caulier.gilles 2009-05-24 16:01:15 UTC
ok, 

I can reproduce the problem here with KDE4, comparing Gwenview and digiKam.

Good news : I have a fix, and this is the results :

[gilles@localhost large]$ exiftool b11cb1d3e54783446c86d995683882c0.png 
ExifTool Version Number         : 7.67                                  
File Name                       : b11cb1d3e54783446c86d995683882c0.png  
Directory                       : .                                     
File Size                       : 51 kB                                 
File Modification Date/Time     : 2009:05:24 15:58:05+02:00             
File Type                       : PNG                                   
MIME Type                       : image/png                             
Image Width                     : 256                                   
Image Height                    : 170                                   
Bit Depth                       : 8                                     
Color Type                      : RGB with Alpha                        
Compression                     : Deflate/Inflate                       
Filter                          : Adaptive                              
Interlace                       : Noninterlaced                         
Pixels Per Unit X               : 3780                                  
Pixels Per Unit Y               : 3780                                  
Pixel Units                     : Meters                                
Software                        : Digikam Thumbnail Generator           
Thumb M Time                    : 1241199026                            
Thumb URI                       : file:///mnt/data/photo/test/batch%20queue%20manager/test%20with%20utf8%20char%20as%20'%C3%B6'/PICT2079.png                                                                                                                
Image Size                      : 256x170 

[gilles@localhost normal]$ exiftool b11cb1d3e54783446c86d995683882c0.png
ExifTool Version Number         : 7.67
File Name                       : b11cb1d3e54783446c86d995683882c0.png
Directory                       : .
File Size                       : 14 kB
File Modification Date/Time     : 2009:05:24 15:59:00+02:00
File Type                       : PNG
MIME Type                       : image/png
Image Width                     : 128
Image Height                    : 85
Bit Depth                       : 8
Color Type                      : RGB
Compression                     : Deflate/Inflate
Filter                          : Adaptive
Interlace                       : Noninterlaced
Pixels Per Unit X               : 3780
Pixels Per Unit Y               : 3780
Pixel Units                     : Meters
Software                        : Gwenview
Thumb Image Height              : 2428
Thumb Image Width               : 3646
Thumb M Time                    : 1241199026
Thumb Mimetype                  : image/png
Thumb Size                      : 24544311
Thumb Uri                       : file:///mnt/data/photo/test/batch%20queue%20manager/test%20with%20utf8%20char%20as%20'%C3%B6'/PICT2079.png
Image Size                      : 128x85

Gilles Caulier
Comment 10 caulier.gilles 2009-05-24 16:02:38 UTC
SVN commit 972293 by cgilles:

fix uri encryption path. Use KUrl::url() now, as Gwenview.
BUG: 152877


 M  +2 -2      thumbnailbasic.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=972293