Bug 418873 - For .torrent files with file names in cyrillic win-cp1251 encoding, KTorrent creates files with UTF8 replacement symbols EF BF BD
Summary: For .torrent files with file names in cyrillic win-cp1251 encoding, KTorrent ...
Status: REPORTED
Alias: None
Product: ktorrent
Classification: Applications
Component: general (other bugs)
Version First Reported In: 5.1
Platform: Kubuntu Linux
: NOR normal
Target Milestone: ---
Assignee: Joris Guisson
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-03-15 13:18 UTC by Wladimir Mutel
Modified: 2020-03-15 13:18 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:


Attachments
( copy of the .torrent file from https://rutracker.org/forum/viewtopic.php?t=1587139 ) (42.20 KB, application/x-bittorrent)
2020-03-15 13:18 UTC, Wladimir Mutel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Wladimir Mutel 2020-03-15 13:18:20 UTC
Created attachment 126807 [details]
( copy of the .torrent file from https://rutracker.org/forum/viewtopic.php?t=1587139 )

SUMMARY

This particular .torrent file I downloaded from https://rutracker.org/forum/viewtopic.php?t=1587139 . I am attaching its copy created by KTorrent in ~/.local/share/ktorrent/tor1/torrent . When I give it to KTorrent to download, it creates all files and folders with every cyrillic char replaced into UTF8 EF BF BD sequence ( https://apps.timwhitlock.info/unicode/inspect?s=%EF%BF%BD ). When I view torrent file with 'less', I see inside UTF8 Cyrillic names for every file, along with (I suppose) same name encoded in Windows cp1251. 

STEPS TO REPRODUCE
1. Get .torrent file from the URL above, or use attached copy
2. Ask KTorrent to download files&folders according to this .torrent
3. Look within the resulting files&folders tree

OBSERVED RESULT

When I do 'find', 'ls' or 'pwd' in the downloaded file&folder tree, I see question marks inside diamond or hexagonal shapes. When I pipe that through 'xxd', I see these UTF8 triples, EF BF BD, repeated for every such char

EXPECTED RESULT

Files and folders should be properly named according to their UTF8 names specified inside .torrent file. cp1251-encoded names should be either ignored, or converted into UTF8, or (the least of evils) created in the filesystem as cp1251 byte sequences which could then be renamed into their UTF8 equivalents with some additional script

SOFTWARE/OS VERSIONS
Windows: 
macOS: 
Linux/KDE Plasma: Ubuntu 20.04 (development version)
(available in About System)
KDE Plasma Version: 5.18.3
KDE Frameworks Version: 5.67.0
Qt Version: 5.12.5

ADDITIONAL INFORMATION