Bug 418873

Summary: For .torrent files with file names in cyrillic win-cp1251 encoding, KTorrent creates files with UTF8 replacement symbols EF BF BD
Product: [Applications] ktorrent Reporter: Wladimir Mutel <mwg>
Component: generalAssignee: Joris Guisson <joris.guisson>
Status: REPORTED ---    
Severity: normal    
Priority: NOR    
Version First Reported In: 5.1   
Target Milestone: ---   
Platform: Kubuntu   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:
Attachments: ( copy of the .torrent file from https://rutracker.org/forum/viewtopic.php?t=1587139 )

Description Wladimir Mutel 2020-03-15 13:18:20 UTC
Created attachment 126807 [details]
( copy of the .torrent file from https://rutracker.org/forum/viewtopic.php?t=1587139 )

SUMMARY

This particular .torrent file I downloaded from https://rutracker.org/forum/viewtopic.php?t=1587139 . I am attaching its copy created by KTorrent in ~/.local/share/ktorrent/tor1/torrent . When I give it to KTorrent to download, it creates all files and folders with every cyrillic char replaced into UTF8 EF BF BD sequence ( https://apps.timwhitlock.info/unicode/inspect?s=%EF%BF%BD ). When I view torrent file with 'less', I see inside UTF8 Cyrillic names for every file, along with (I suppose) same name encoded in Windows cp1251. 

STEPS TO REPRODUCE
1. Get .torrent file from the URL above, or use attached copy
2. Ask KTorrent to download files&folders according to this .torrent
3. Look within the resulting files&folders tree

OBSERVED RESULT

When I do 'find', 'ls' or 'pwd' in the downloaded file&folder tree, I see question marks inside diamond or hexagonal shapes. When I pipe that through 'xxd', I see these UTF8 triples, EF BF BD, repeated for every such char

EXPECTED RESULT

Files and folders should be properly named according to their UTF8 names specified inside .torrent file. cp1251-encoded names should be either ignored, or converted into UTF8, or (the least of evils) created in the filesystem as cp1251 byte sequences which could then be renamed into their UTF8 equivalents with some additional script

SOFTWARE/OS VERSIONS
Windows: 
macOS: 
Linux/KDE Plasma: Ubuntu 20.04 (development version)
(available in About System)
KDE Plasma Version: 5.18.3
KDE Frameworks Version: 5.67.0
Qt Version: 5.12.5

ADDITIONAL INFORMATION