Bug 402298 - Invalid Unicode chars in file/foldernames appear to make file copies abort
Summary: Invalid Unicode chars in file/foldernames appear to make file copies abort
Status: RESOLVED FIXED
Alias: None
Product: frameworks-kio
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: git master
Platform: Other Linux
: NOR normal
Target Milestone: ---
Assignee: David Faure
URL:
Keywords:
: 402697 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-12-18 13:13 UTC by bluescreenavenger
Modified: 2021-08-09 22:28 UTC (History)
6 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Archive containing the logs, and the tree, and the Seed.txt file to pass to the kiocopy script (18.56 KB, application/x-xz)
2018-12-18 13:13 UTC, bluescreenavenger
Details
Bash script of which to use to recreate the file tree that failed to copy. Pass the path to the Seed.txt to it. (17.80 KB, application/x-shellscript)
2018-12-18 13:14 UTC, bluescreenavenger
Details

Note You need to log in before you can comment on or make changes to this bug.
Description bluescreenavenger 2018-12-18 13:13:51 UTC
Created attachment 116993 [details]
Archive containing the logs, and the tree, and the Seed.txt file to pass to the kiocopy script

Hi

In attempts to replicate Bug #162211, I made a script that mass "seeds" files, saves the file info in a text file, and can then recreate the files with the text file, then try to copy them with kioclient5.

I inadvertently created invalid Unicode chars, trying to split large file names down to exactly 255 bytes. (but I guess in a corrupt file system or something, something similar can happen. I think in the processes of doing this, I replicated it in a way.

Copying with kioclient5, with export QT_LOGGING_RULES=kf5.kio.*=true, the final line is, and it silently aborts.
"Could not make folder /tmp/kiocopy/1545096048/folder_dest/testdir/0l>╚>/1H{n.4🐧N5>2ä<╚#uDJjà57vÉ7BFI*;9jÄ>w@Vdäâ2@Op*8w>i5-by4P{èåhÄÄDw9î1RÆ4x3©h23uÉè)<W?&&,🐧WÄ31Hbv5î,7VTDCrÉï'So🐧D#\"}127z🐧4ij8f®uZ58®pÄJHê®9ÄÅ4Æ-Dj zpN0èg1JB𖡒?Zh{Hom6𖡒B9i1\"èEms2.1.5¾ch@ÅWÆ<äU})2CW1h\t𖡒V9g�."

at the end, and then kioclient5 exits with the status of 1.

When I run tree against the source, and the destination, then diff the tree output, the last lines are:
-17 directories, 413 files
+7 directories, 0 files
meaning only 7 folders got copied


Let me know if I need to provide any info. 
This was kio 5.44 admittedly, ...but I also get similar results from a version from Mid October 2018, built from Git.

But it looks like I possibly found a case of silent failure?


I do get somewhat better results, if I change these two variables
LC_ALL=C 
LANG=C
(which I did to force Bash to split Unicode chars)

I then get
-17 directories, 413 files
+8 directories, 221 files
But I also get a bunch of dialogs warning me that some files "can't be found", but it's better than a silent failure...


I will attach 2 files, one contains the Seed file, the command line output of kioclient5, and the trees of the source, and the Destination
One is the script of which you can pass the path to the Seed file to as the first argument to the script, in order to make it create the exact same tree that failed to copy, so that the experiment can be recreated exactly.

Let me know if you need anything
Comment 1 bluescreenavenger 2018-12-18 13:14:16 UTC
Created attachment 116994 [details]
Bash script of which to use to recreate the file tree that failed to copy. Pass the path to the Seed.txt to it.
Comment 2 David Edmundson 2018-12-18 13:46:25 UTC
LANG definitely can make a difference to the way things run. 

Filesystems don't have a concept of UTF-8 or anything else just a concept of a raw bytestream. We then put that into a QUrl which obviously follows the locale it runs in. Though the worst this should do is (correctly) fail on a file conflict where one didn't previously exist.

I think your script is creating a byte array that isn't valid text in any format. I can understand KIO failing to copy that because we don't treat things as a byte stream all the way through.

However I /think/ KIO is generating an error correctly.

The reason you're not getting the dialog in kioclient is a bug in kioclient fixed with D17652 and D17653. With those changes using Using your test with the seed here I do get an error.
Comment 3 i.Dark_Templar 2018-12-18 18:13:50 UTC
(In reply to Christoph Feck from comment https://bugs.kde.org/show_bug.cgi?id=162211#c128)
> The filename limit is 255 bytes, not 255 characters. In UTF-8, any non-ASCII
> character needs more than 1 byte. Additionally, the last character looks
> cropped, causing an illegal UTF-8 name, which Qt does not handle.

Not sure if answer should go here or in old bug, so I'll post here.
While it's a posix-compliant limit of 255 bytes, ntfs filesystem has a different limit on max filename length: 255 characters, which may be more than 255 bytes if UTF-8 is used. And ntfs-3g may return such long filenames. I've already hit issue with such limit inconsistencies once:

https://phabricator.kde.org/D8413
Comment 4 bluescreenavenger 2018-12-19 01:22:52 UTC
Seeing that the issue is in kiocleint5, I locally modified the script here, and made it bring up Dolphin, so I can copy the files. 

I DID indeed get a dialog this time saying it can't enter that folder... ...however it completely stopped the copy I started. The first attempt, only one item went through. copying other items without the corrupt name separately worked fine. It wasn't COMPLETELY silent, because I DID get an error dialog, but I would assume that it should have tried to just skip that one...

I guess the only case where such a corrupt file name might be created for MOST users is a corrupt file system... ...maybe. Guess I didn't replicate the alleged silent failure after all?
Comment 5 Christoph Feck 2019-01-09 03:01:14 UTC
Invalid UTF-8 filenames mostly appear when non-aware tools (e.g. old archivers) create filenames with a different encoding, despite UTF-8 set in the locale. I am right now investigating if the encoding hack done by Róbert (see bug 165044 comment #142) can be ported to Qt 5.
Comment 6 bluescreenavenger 2019-01-21 04:20:46 UTC
I assume  https://phabricator.kde.org/D18161 is your attempt. Just to let you know, It works for me, and allows the test to pass. The paths with the corrupted names copy perfectly.
Comment 7 Ahmad Samir 2019-09-25 11:29:29 UTC
@Christoph: is this bug fixed? (looking at the patch at phabrictor, I'd say yes, but I am not sure).
Comment 8 Christoph Feck 2019-10-24 08:24:19 UTC
According to comment #6, the legacy encoding hack fixed this issue.
Comment 9 Ahmad Samir 2021-08-09 22:28:08 UTC
*** Bug 402697 has been marked as a duplicate of this bug. ***