Bug 452458

Summary: Broken URLs generated for non-ascii character filenames
Product: [Applications] kphotoalbum Reporter: Pierre Etchemaïté <pe-kde>
Component: HTML generatorAssignee: KPhotoAlbum Bugs <kphotoalbum-bugs-null>
Status: RESOLVED FIXED    
Severity: normal CC: tl
Priority: NOR    
Version First Reported In: GIT master   
Target Milestone: ---   
Platform: Compiled Sources   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:

Description Pierre Etchemaïté 2022-04-10 09:53:23 UTC
SUMMARY
Filenames containing accented characters are latin1(?) percent encoded in URLs (eg é -> &#233;) in the generated index.html, leading to broken links both locally and when browsed thru an Apache server

STEPS TO REPRODUCE
1. Create an image with extended ascii characters in name (é, è, ê,...), 
    $ ls *Carré\).* 
    '200416 a (Carré).jpg'
    $ ls *Carré\).*|od -c
    0000000   2   0   0   4   1   6       a       (   C   a   r   r 303 251
    0000020   )   .   j   p   g  \n
    0000026
2. Generate an HTML page containing that image
3. (publish result on a web server)
4. Browse the page (tested with Konqueror, Firefox, Chromium)

OBSERVED RESULT
Thumbnail for the image is okay, but mouse-over preview and full image are broken links

EXPECTED RESULT
All images to appear in generated page

SOFTWARE/OS VERSIONS
Linux/KDE Plasma:  Ubuntu 21.10 with KDE libs
KDE Plasma Version: 5.22.5
KDE Frameworks Version: 5.86.0
Qt Version: 5.15.2
Comment 1 Pierre Etchemaïté 2022-04-10 10:09:11 UTC
Extra information:
filesystem: ext4
locale:
$ locale   
LANG=fr_FR.UTF-8
LANGUAGE=
LC_CTYPE="fr_FR.UTF-8"
LC_NUMERIC="fr_FR.UTF-8"
LC_TIME="fr_FR.UTF-8"
LC_COLLATE="fr_FR.UTF-8"
LC_MONETARY="fr_FR.UTF-8"
LC_MESSAGES="fr_FR.UTF-8"
LC_PAPER="fr_FR.UTF-8"
LC_NAME="fr_FR.UTF-8"
LC_ADDRESS="fr_FR.UTF-8"
LC_TELEPHONE="fr_FR.UTF-8"
LC_MEASUREMENT="fr_FR.UTF-8"
LC_IDENTIFICATION="fr_FR.UTF-8"
LC_ALL=
Comment 2 Tobias Leupold 2022-04-10 10:37:19 UTC
Git commit 99ca48526ec6b6609af674be0401e43f7e39bb19 by Tobias Leupold.
Committed on 10/04/2022 at 10:34.
Pushed by tleupold into branch 'master'.

Use UTF-8 characters witout masking them when generating HTML

M  +2    -14   HTMLGenerator/Generator.cpp
M  +0    -1    HTMLGenerator/Generator.h

https://invent.kde.org/graphics/kphotoalbum/commit/99ca48526ec6b6609af674be0401e43f7e39bb19
Comment 3 Tobias Leupold 2022-04-10 10:39:02 UTC
Thanks for your report! I see this as well for non-ascii characters. The locale doesn't matter I think.

As we use UTF-8 for the HTML page anyway, I think we can simply leave out the masking of special characters and leave them as-is. This fixes the gallery for me for non-ascii characters.
Comment 4 Pierre Etchemaïté 2022-04-10 15:44:40 UTC
Problem fixed indeed, thanks for this quick reply!