Bug 364809

Summary: can't print file that contains invalid UTF-8 sequence
Product: [Applications] okular Reporter: peter.maloney
Component: printingAssignee: Okular developers <okular-devel>
Status: RESOLVED WORKSFORME    
Severity: normal CC: aacid, andrew.crouthamel, luigi.toscano, m.weghorn, ralfixx, simonandric5, spammail01
Priority: NOR Keywords: triaged
Version: 0.25.0   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In:
Attachments: test pdf file

Description peter.maloney 2016-06-27 11:09:28 UTC
maybe 2 related bugs reported together

(1) can't open a file with an invalid utf8 character in it, even though it's a valid filename
    okular 'kindergeld'$'\303\244''nderung3.pdf'
        
    error window pops up "Could not open kindergeldänderung3.pdf"

    can work around this by renaming the file, or symlinking to it
     
    the file is available at this link, and I'll try attaching it too: https://www.arbeitsagentur.de/web/wcm/idc/groups/public/documents/webdatei/mdaw/mdc0/~edisp/l6019022dstbai385303.pdf

(2) can't print the file because the "job name" it gives the printer has an invalid utf8 character
    on console, you can see output like:
        lpr: Bad job-name value: "job-name": Bad name value "Ver�nderungsmitteilung" - bad UTF-8 sequence (RFC 2911 section 4.1.2).

    the command it uses (obtained by replacing the lpr command with a script that logs it)
        /usr/bin/lpr -P okidoki -#1 -J Ver�nderungsmitteilung -r /tmp/kde-peter/okularH13352.ps

And while that's true that the sequence is invalid UTF-8, I don't really care what the job name is, and would rather it just fake it and print; it wouldn't affect the hard copy. And okular set the job name... it doesn't give me control of it. I wouldn't even know what the job name was to use it since it doesn't match the title or filename or anything.
        
So my workaround was to write a wrapper script that uses iconv to check args and replaces invalid ones with $RANDOM. And that works.

Reproducible: Always
Comment 1 peter.maloney 2016-06-27 11:10:49 UTC
Created attachment 99720 [details]
test pdf file

a tar.gz with the pdf to test with... the tar is intended to preserve the bad encoding in the filename
Comment 2 Michael Weghorn 2017-07-18 08:32:31 UTC
Do you still have the described problems?
I just tried to reproduce them with the Okular versions included in KDE neon (Devedition gitstable) and in Debian testing and could not reproduce them.

Both, opening the file and printing, worked fine for me.

If the problems are still present: Can you please give some more information about your system (e.g. which Linux distribution, what version,...) and the output of the command `locale`?
Comment 3 Albert Astals Cid 2017-07-29 10:21:46 UTC
Please answer Michael
Comment 4 Florian E.J. Fruth 2017-07-29 22:15:17 UTC
I still can't print this or other pdf files with "umlauts" in the title of the document. But I also can't reproduce this or another error message. And it's definitely okular:
1. if I replace /usr/bin/lpr with a bash script which does a "/usr/bin/lpr.orig $* -J dummy" it does work
2. firefox prints the same pdf file without problems
Comment 5 Albert Astals Cid 2017-07-29 22:25:15 UTC
Which cups are you using?
Comment 6 Florian E.J. Fruth 2017-07-30 18:34:51 UTC
Standard Ubuntu 16.04.2:

dpkg --get-selections | grep -i cups | sed -r 's/\t.+//g' | xargs apt-get --reinstall -s install
[...]
Inst cups-browsed [1.8.3-2ubuntu3.1] (1.8.3-2ubuntu3.1 Ubuntu:16.04/xenial-updates [amd64])
Inst cups-daemon [2.1.3-4] (2.1.3-4 Ubuntu:16.04/xenial [amd64])
Inst cups-pdf [2.6.1-21] (2.6.1-21 Ubuntu:16.04/xenial [amd64])
Inst bluez-cups [5.37-0ubuntu5] (5.37-0ubuntu5 Ubuntu:16.04/xenial [amd64])
Inst cups-filters-core-drivers [1.8.3-2ubuntu3.1] (1.8.3-2ubuntu3.1 Ubuntu:16.04/xenial-updates [amd64])
Inst libcups2 [2.1.3-4] (2.1.3-4 Ubuntu:16.04/xenial [amd64])
Inst libcups2:i386 [2.1.3-4] (2.1.3-4 Ubuntu:16.04/xenial [i386])
Inst libcupsmime1 [2.1.3-4] (2.1.3-4 Ubuntu:16.04/xenial [amd64])
Inst cups-core-drivers [2.1.3-4] (2.1.3-4 Ubuntu:16.04/xenial [amd64])
Inst cups-server-common [2.1.3-4] (2.1.3-4 Ubuntu:16.04/xenial [all])
Inst cups-client [2.1.3-4] (2.1.3-4 Ubuntu:16.04/xenial [amd64])
Inst cups [2.1.3-4] (2.1.3-4 Ubuntu:16.04/xenial [amd64])
Inst cups-filters [1.8.3-2ubuntu3.1] (1.8.3-2ubuntu3.1 Ubuntu:16.04/xenial-updates [amd64])
Inst cups-common [2.1.3-4] (2.1.3-4 Ubuntu:16.04/xenial [all])
Inst cups-ppdc [2.1.3-4] (2.1.3-4 Ubuntu:16.04/xenial [amd64])
Inst libcupscgi1 [2.1.3-4] (2.1.3-4 Ubuntu:16.04/xenial [amd64])
Inst libcupsfilters1 [1.8.3-2ubuntu3.1] (1.8.3-2ubuntu3.1 Ubuntu:16.04/xenial-updates [amd64])
Inst libcupsimage2 [2.1.3-4] (2.1.3-4 Ubuntu:16.04/xenial [amd64])
Inst libcupsppdc1 [2.1.3-4] (2.1.3-4 Ubuntu:16.04/xenial [amd64])
Inst printer-driver-cups-pdf [2.6.1-21] (2.6.1-21 Ubuntu:16.04/xenial [amd64])
Inst printer-driver-hpcups [3.16.3+repack0-1] (3.16.3+repack0-1 Ubuntu:16.04/xenial [amd64])
Inst python3-cups [1.9.73-0ubuntu2] (1.9.73-0ubuntu2 Ubuntu:16.04/xenial [amd64])
Inst python3-cupshelpers [1.5.7+20160212-0ubuntu2] (1.5.7+20160212-0ubuntu2 Ubuntu:16.04/xenial [all])
Inst cups-bsd [2.1.3-4] (2.1.3-4 Ubuntu:16.04/xenial [amd64])
Comment 7 Michael Weghorn 2017-07-30 20:50:29 UTC
Can you provide the output of the command 'locale' run from command line?
Comment 8 Michael Weghorn 2017-07-30 21:10:36 UTC
I just experimented a bit. When I use an invalid locale, I can reproduce the message given in the bug report when printing: 

$ export LC_ALL=foobar
bash: warning: setlocale: LC_ALL: cannot change locale (foobar): No such file or directory
$ locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_GB.utf8
LANGUAGE=en_GB:en
LC_CTYPE="foobar"
LC_NUMERIC="foobar"
LC_TIME="foobar"
LC_COLLATE="foobar"
LC_MONETARY="foobar"
LC_MESSAGES="foobar"
LC_PAPER="foobar"
LC_NAME="foobar"
LC_ADDRESS="foobar"
LC_TELEPHONE="foobar"
LC_MEASUREMENT="foobar"
LC_IDENTIFICATION="foobar"
LC_ALL=foobar
$ okular kindergeldänderung3.pdf
[...]
lpr: Bad job-name value: "job-name": Bad name value "Ver�nderungsmitteilung" - bad UTF-8 sequence (RFC 2911 section 4.1.2).


As mentioned in my previous comment, the output of the 'locale' command might be very helpful to see whether the problem you are experiencing might also have to do with incorrect locale settings.
Comment 9 Florian E.J. Fruth 2017-07-31 14:32:10 UTC
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
Comment 10 Luigi Toscano 2017-07-31 14:33:41 UTC
(In reply to Florian E.J. Fruth from comment #9)
> LANG=
> LANGUAGE=
> LC_CTYPE="POSIX"
> LC_NUMERIC="POSIX"
> LC_TIME="POSIX"
> LC_COLLATE="POSIX"
> LC_MONETARY="POSIX"
> LC_MESSAGES="POSIX"
> LC_PAPER="POSIX"
> LC_NAME="POSIX"
> LC_ADDRESS="POSIX"
> LC_TELEPHONE="POSIX"
> LC_MEASUREMENT="POSIX"
> LC_IDENTIFICATION="POSIX"
> LC_ALL=

Please try to use a proper locale (something which ends with .UTF-8, even if only en_US.UTF-8).
Comment 11 Florian E.J. Fruth 2017-07-31 15:47:37 UTC
I can't reproduce the problem today :( - doesn't matter what the locale says...
Comment 12 Albert Astals Cid 2017-08-08 20:15:38 UTC
*** Bug 383274 has been marked as a duplicate of this bug. ***
Comment 13 ralfixx 2017-08-09 08:06:39 UTC
Here is my locale (redirected from Bug-Id 383274)

% locale
LANG=de_DE.UTF-8
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=C

% uname -a
Linux panther 4.4.74-18.20-default #1 SMP Fri Jun 30 19:01:19 UTC 2017 (b5079b8) x86_64 x86_64 x86_64 GNU/Linux

If required I could also provide the failing document. The "offending" job name is set inside this document somehow, the file name itself does not matter.
Comment 14 Luigi Toscano 2017-08-09 08:13:48 UTC
(In reply to ralfixx from comment #13)
> Here is my locale (redirected from Bug-Id 383274)
> 
> % locale
> LANG=de_DE.UTF-8
> LC_CTYPE="C"
> LC_NUMERIC="C"
> LC_TIME="C"
> LC_COLLATE="C"
> LC_MONETARY="C"
> LC_MESSAGES="C"
> LC_PAPER="C"
> LC_NAME="C"
> LC_ADDRESS="C"
> LC_TELEPHONE="C"
> LC_MEASUREMENT="C"
> LC_IDENTIFICATION="C"
> LC_ALL=C

That looks like the source of the issue. Try to unset LC_ALL (which should be fine unset) and see if the other LC_* are the same as LANG. If it does not work, please set LC_ALL to de_DE.UTF-8 too.
Comment 15 ralfixx 2017-08-09 12:40:29 UTC
Indeed.

% unsetenv LC_ALL
% locale
LANG=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC=POSIX
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=
% okular doc.pdf
[prints ok]

% setenv LC_ALL C
% okular doc.pdf
[error printing]

I had set LC_ALL to C due to i18n issues in other programs. I can work around this via a wrapper script to okular unsetting LC_ALL.

It seems somewhat unfortunate that there is no feedback at all when the printing  fails, but that's life I suppose.  

As someone else mentioned, I really don't care what the print job name is, and I find it somewhat unfortunate to make it depend on some random stuff in the document.  I hope this does not open up new code injection paths, like setting the job name inside the document to ';rm -rf /'.
Comment 16 Andrew Crouthamel 2018-09-28 02:40:57 UTC
Dear Bug Submitter,

This bug has been in NEEDSINFO status with no change for at least 15 days. Please provide the requested information as soon as possible and set the bug status as REPORTED. Due to regular bug tracker maintenance, if the bug is still in NEEDSINFO status with no change in 30 days, the bug will be closed as RESOLVED > WORKSFORME due to lack of needed information.

For more information about our bug triaging procedures please read the wiki located here: https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

If you have already provided the requested information, please set the bug status as REPORTED so that the KDE team knows that the bug is ready to be confirmed.

Thank you for helping us make KDE software even better for everyone!
Comment 17 peter.maloney 2018-09-28 11:59:58 UTC
I don't know what info you are waiting for.

Could you just work around the lpr problem by checking for valid UTF-8 and replacing invalid bits with "?" instead? Simply forbid sending invalid output to the lpr command's input. It's a good software development practice to "be generous with input and strict about output"

And for the other problem about the filename, you can probably just avoid encoding/decoding the filename at all, just using it as latin1 / iso-8859-1 in your strings, which won't change anything. Opening files basically uses syscalls that work already, so you just have to avoid lossy conversions of this malformed character data.
Comment 18 Andrew Crouthamel 2018-09-28 16:57:37 UTC
I'll mark this reported for follow-up.
Comment 19 Albert Astals Cid 2018-09-28 20:37:14 UTC
I'm sorry but no, i'm not just going to do stuff in latin1 just because you think something is wrong. 

You have to prove something is wrong, and AFAICS that has not happened.
Comment 20 peter.maloney 2018-10-04 13:19:34 UTC
so looks like it works fine today... could have been fixed 2 years ago since this report is so old (don't know if cups fixed it, or okular/KDE did)


[pid 27359] execve("/usr/bin/lpr", ["/usr/bin/lpr", "-P", "okidoki", "-#1", "-J", "Ver\303\244nderungsmitteilung", "-o", "media=A4", "-o", "portrait", "-o", "sides=one-sided", "-o", "outputorder=normal", "-o", "Collate=True", "-o", "page-left=13", "-o", "page-top=13", "-o", "page-right=13", "-o", "page-bottom=13", "-o", "fit-to-page", "-o", "number-up=1", "-o", "number-up-layout=lrtb", "-o", "job-billing", ...], 0x7ffc868b9130 /* 60 vars */ <unfinished ...>

cups-nosystemd 2.2.3-1
okular 18.08.1-3

Since there is no way to know who fixed it or whether a regression test has been added, I'll mark it WORKSFORME instead of FIXED.