Bug 172242

Summary: Phonon xine backend will not play files with non-ascii names in non-UTF8 locales
Product: [Frameworks and Libraries] Phonon Reporter: Alex Merry <alex.merry>
Component: Xine backendAssignee: Matthias Kretz <kretz>
Status: RESOLVED FIXED    
Severity: normal CC: amarok-bugs-dist, b.buschinski, david, Henning.Fleddermann, hillman.dai, icephoenix.nx1729+kde, jpalecek, kdebugs, kevin.kofler, marcosgdavid, martin.sandsmark, michelbriand, s7mon, sami.kyostila, simon, sjuengling, spamaccountmeister, tijl, tmonnereau
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Compiled Sources   
OS: Linux   
Latest Commit: Version Fixed In: 4.4.3
Sentry Crash Report:
Attachments: patch to support file name encoding other than UTF-8
and another one to test
amarok debug output

Description Alex Merry 2008-10-05 23:24:35 UTC
Version:            (using Devel)
Compiler:          gcc version 4.3.2 
OS:                Linux
Installed from:    Compiled sources

My computer is in the en_GB.iso88591 locale (so this is the encoding for filenames).  Playing ASCII files is no problem.  However, attempting to play a file with an e acute (é) in the name (with tutorial1 from the phonon examples, say) doesn't work.  The GStreamer backend works fine.

Debug output:


----------------------------------------------- audio_port created

UpdateVolumeEvent
################################ Event:  UpdateVolume
XineThread Rewire event:
      MediaObject(0x8be9008)  ->  AudioOutput(0x8ecee78)
"file:///home/music/Les_Mis%C3%A9rables/CD1_01_Work_Song.ogg" ,  1
################################ Event:  MrlChanged
MediaObject is connected to  1  nodes
creating xine_stream with null video port
 PLAY
 start faking
fake state change: reached BufferingState after  0
XINE_PARAM_EARLY_FINISHED_EVENT: 1
calling xineOpen from MrlChanged
XINE_EVENT_UI_MESSAGE
xine_open failed for m_mrl = file:///home/music/Les_Mis%C3%A9rables/CD1_01_Work_Song.ogg
1 "22:19:55: input_file: File not found: >file:///home/music/Les_Mis%C3%A9rables/CD1_01_Work_Song.ogg<
"
reached error state
XINE_EVENT_QUIT
################################ Event:  PlayCommand
 end faking
reached  5  after  3
Phonon::Xine::MediaObject(0x8ec7db0) XX Phonon::Xine::AudioOutput(0x8c0a7e0)
0x8be9008 XX AudioOutput(0x8ecee78)

Phonon::Xine::NullSink(0x8f81bf0) is neither a source nor a sink
XineThread Rewire event:
      MediaObject(0x8be9008)  XX  AudioOutput(0x8ecee78)
----------------------------------------------- audio_port destroyed



The file name in question is
/home/music/Les_Misérables/CD1_01_Work_Song.ogg

Some searching through the paths involved shows that QUrl uses a Unicode representation of the filename internally.  However, when it is asked for an encoded URL, it does a %-encoding of the UTF-8 encoding of the filename.  This is then passed to Xine.

I would guess that Xine simply reverses the %-encoding and then tries to use the resulting UTF-8 encoding of the filename.  This, of course, doesn't exist since it needs recoding to the system locale.

I'm not sure whether phonon-xine, Xine or Qt is at fault here.
Comment 1 Matthias Kretz 2009-01-04 23:56:21 UTC
Created attachment 29914 [details]
patch to support file name encoding other than UTF-8

Please test this patch. I can't get KDE4 to work with any file encoding other than UTF-8 so I'm unable to test myself.
Comment 2 Matthias Kretz 2009-01-05 00:33:03 UTC
Created attachment 29916 [details]
and another one to test

this patch tested with Latin-1 and dolphin and dragon seems to work, please confirm
Comment 3 Alex Merry 2009-01-05 00:57:23 UTC
Unfortunately, my filesystems are now in UTF-8 encoding, so I can't test it.
Comment 4 Mark Kretschmann 2009-01-05 07:09:34 UTC
*** Bug 175041 has been marked as a duplicate of this bug. ***
Comment 5 Rodrigo Fresneda 2009-02-05 01:44:39 UTC
I confirm this bug for kde 4.2.0 from debian experimental (encoding en_US.ISO-8859-15)
Comment 6 simon 2009-02-05 01:49:33 UTC
sorry, can't test as i also changed to uft-8
Comment 7 Matthias Kretz 2009-02-05 09:54:55 UTC
Rodrigo: I know that it breaks. I need testers to confirm that the attached patch fixes it. Would you please apply the last attachment to phonon-4.3.0/xine/ and recompile and install the xine backend and test again?
Comment 8 Rodrigo Fresneda 2009-02-05 20:42:20 UTC
Hi, I applied patch 172242.patch to the sources I obtained from debian experimental,
and the problem seems solved. Now I can play my en_US.ISO-8859-15 coded mp3 files
in AmaroK, and additionally, the id3 tags show up correctly in the playlist.
Here's what I did:
apt-get source -t experimental phonon-backend-xine
cd phonon-4.3.0/xine
patch -p0 < 172242.patch
cd ..
dpkg-buildpackage
cd ..
dpkg -i  phonon-backend-xine_4.3.0-1_i386.deb
Comment 9 Gunter Ohrner 2009-02-10 00:52:31 UTC
I just tested patch No. 2 on my Debian system, using a patched phonon package from experimental, and it seems to work - at least it does play without crashing immediately. ;)
Comment 10 Matthias Kretz 2009-02-10 10:17:29 UTC
SVN commit 924144 by mkretz:

Let MediaObject encode the URL to a Xine MRL. This is not as easy as
QUrl::toEncoded as xine then can't know anymore what encoding to use for local
files. Instead, for local files, use QFile::encodeName to create an 8-bit string
that is then percent encoded as needed and prefixed with "file:/".

BUG: 172242


 M  +26 -7     mediaobject.cpp  
 M  +0 -12     xinestream.cpp  
 M  +0 -2      xinestream.h  


WebSVN link: http://websvn.kde.org/?view=rev&revision=924144
Comment 11 Matthias Kretz 2009-02-10 10:20:08 UTC
SVN commit 924145 by mkretz:

Let MediaObject encode the URL to a Xine MRL. This is not as easy as
QUrl::toEncoded as xine then can't know anymore what encoding to use for local
files. Instead, for local files, use QFile::encodeName to create an 8-bit string
that is then percent encoded as needed and prefixed with "file:/".

BUG: 172242


 M  +26 -7     mediaobject.cpp  
 M  +0 -12     xinestream.cpp  
 M  +0 -2      xinestream.h  


WebSVN link: http://websvn.kde.org/?view=rev&revision=924145
Comment 12 Dan Meltzer 2009-02-16 17:48:32 UTC
*** Bug 184525 has been marked as a duplicate of this bug. ***
Comment 13 Maksim Khokhlov 2009-02-17 01:52:10 UTC
Tested the patch with ru_RU.CP1251 encoded file names on Debian: works fine. Thanks a lot!
Comment 14 Dan Meltzer 2009-03-01 04:22:43 UTC
*** Bug 185848 has been marked as a duplicate of this bug. ***
Comment 15 Mark Kretschmann 2009-03-23 20:24:02 UTC
*** Bug 187940 has been marked as a duplicate of this bug. ***
Comment 16 Mark Kretschmann 2009-03-28 22:17:51 UTC
*** Bug 188362 has been marked as a duplicate of this bug. ***
Comment 17 Mark Kretschmann 2009-03-28 22:18:47 UTC
*** Bug 188363 has been marked as a duplicate of this bug. ***
Comment 18 Kevin Kofler 2009-06-18 19:32:57 UTC
Line 324 of mediaobject.cpp, in mrlEncode:
> if (c & 0x80 || c == '\\' || c < 32 || c == '%') {
is missing:
|| c == '#'

This breaks file names with # in them.
Comment 19 Rex Dieter 2009-06-18 19:58:00 UTC
SVN commit 983650 by rdieter:

encode #'s too

CCBUG: 172242


 M  +1 -1      branches/phonon/4.3/xine/mediaobject.cpp  
 M  +1 -1      trunk/kdesupport/phonon/xine/mediaobject.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=983650
Comment 20 Raphael Kubo da Costa 2009-09-03 13:32:00 UTC
*** Bug 201806 has been marked as a duplicate of this bug. ***
Comment 21 Jonathan Thomas 2009-12-16 22:53:16 UTC
*** Bug 197252 has been marked as a duplicate of this bug. ***
Comment 22 simon 2010-05-25 18:01:15 UTC
Created attachment 43881 [details]
amarok debug output

Unfortunately this is not resolved for me. (using amarok git 20.05.2010, phonon-4.3.80).
Phonon in this version seems to have the patch without the changes to mrlEncode(). If i add these as well i get only a double encoding as below:
amarok:   [EngineController] [WARNING!] Phonon failed to play this URL. Error:  "16:25:05: input_file: File not found: >file:///home/simon/mp3/111_-_Wolfgang_Ambros_-_W%25C3%25BCst_oda_w%25C3%25BCst_ned.mpc<

Any hints what is still wrong here are appreciated, as this used to work for years.
Comment 23 simon 2010-05-25 20:14:57 UTC
I found out that with phonon-4.3.50_pre20090520  (the previous version i had in gentoo) it works. So i guess one of the latest patches broke this for my system.
Comment 24 Michael 2010-06-03 02:01:51 UTC
I can confirm that downgrading from phonon-4.3.80 to phonon-4.3.50_pre20090520
fixes the issue on Gentoo, and upgrading to phonon-4.4.1-r1 still has the
issue.  This may possibly be related to bug 198008 since it seems that
somewhere between December 2009 and January 2010 the problem reappeared.
Comment 25 Thierry MONNEREAU 2010-06-06 12:48:12 UTC
I've found the same bug ?
https://bugs.kde.org/show_bug.cgi?id=206085
Comment 26 Myriam Schweingruber 2010-06-07 00:17:04 UTC
*** Bug 206085 has been marked as a duplicate of this bug. ***
Comment 27 Myriam Schweingruber 2010-06-07 00:17:40 UTC
Indeed, those are related. Make sure you only use UTF-8 in your system and for your tags, that should solve the issue. You can use kid3 or easytag to retag in UTF-8 quite fast.
Comment 28 Thierry MONNEREAU 2010-06-08 20:32:44 UTC
It's work !

I've put my system on UTF-8 and retag my directories / files

Thanks ! Thanks ! Thanks !

(for help on UTF-8, I use this page : http://wiki.debian.org/fr/UTF-8)
Comment 29 Myriam Schweingruber 2010-06-08 23:10:14 UTC
You are welcome :)
Comment 30 simon 2010-06-18 16:18:49 UTC
Same here, migrated to UTF8 yesterday and now recent phonon works (4.3.80-r1). 
A lot of manual work as many files where using different encodings.
For other users it might be good if there where hints on how to detect what you have. I used to open the directories in vim and checked what characters are shown and looked those up on the net - what characters are used for the expected char in what charset...).
I still think this bug should either be reopened or marked as won't fix as the description (non-ascii names in non-UTF8 locales) is cleary not solved in current phonon releases.

thanks anyway for all the support to migration (irc) and keep up the good work.
Comment 31 Tijl Coosemans 2010-06-19 17:52:29 UTC
This bug has been reintroduced by this commit:
http://gitorious.org/phonon/phonon/commit/012980b10ede6df8947bea0d3e0d923deee422ae

The old code converted a local file MRL to the local encoding. The current code always converts to UTF-8. I'm not sure I fully understand the case that that commit is trying to fix. I can only imagine it must be a system with UTF-8 filesystems but where the user has set his locale to something else. In that case the old code would fail. However, I'm not sure that's worth fixing, because it looks more like a misconfiguration on the user's end. Either he needs to set his locale to UTF-8 or he has to mount the filesystem with the correct character set conversion options to have the UTF-8 converted to his locale.

In any case, playing files with special characters on non-UTF-8 filesystems is currently broken again with phonon-xine.
Comment 32 Myriam Schweingruber 2010-06-21 00:44:05 UTC
Thank you for notifying.
Comment 33 Martin Sandsmark 2010-06-21 22:44:37 UTC
commit b27366bc08834c5b1033d2733cff7009f971e082
Author: Martin T. H. Sandsmark <sandsmark@samfundet.no>
Date:   Mon Jun 21 22:43:27 2010 +0200

    Try to detect if the locale is Unicode, and manually encode
    appropriately if it is not, since QUrl::toEncoded() encodes to UTF-8.
    
    Needs testing.
    
    CCBUG: 172242

diff --git a/xine/mediaobject.cpp b/xine/mediaobject.cpp
index 82c316d..58af3b8 100644
--- a/xine/mediaobject.cpp
+++ b/xine/mediaobject.cpp
@@ -32,6 +32,7 @@
 #include <QMultiMap>
 #include <QtDebug>
 #include <QMetaType>
+#include <QTextCodec>
 #include <QUrl>
 
 #include <cmath>
@@ -319,11 +320,13 @@ void MediaObject::setSource(const MediaSource &source)
 
 static QByteArray mrlEncode(QByteArray mrl)
 {
+    bool localeUnicode = qgetenv("LANG").contains("UTF"); // test this
+
+    unsigned char c;
     for (int i = 0; i < mrl.size(); ++i) {
-        const unsigned char c = static_cast<unsigned char>(mrl.at(i));
-        // we assume that the other invalid characters
-        // are already escaped due to the call to QUrl.toEncoded()
-        if (c == '#') {
+        c = mrl.at(i);
+        if ((localeUnicode && c=='#') || //TODO: remove this abomination in the far future when everyone has gotten sane locales :-D
+            (!localeUnicode && (c & 0x80 || c == '\\' || c < 32 || c == '%' || c == '#'))) {
             char enc[4];
             qsnprintf(enc, 4, "%%%02X", c);
             mrl = mrl.left(i) + QByteArray(enc, 3) + mrl.mid(i + 1);
@@ -357,9 +360,9 @@ void MediaObject::setSourceInternal(const MediaSource &source, HowToSetTheUrl ho
             return;
         }
         {
-            const QByteArray &mrl = (source.url().scheme() == QLatin1String("") ?
-                    "file:/" + mrlEncode (source.url().toEncoded()) :
-                    mrlEncode (source.url().toEncoded()));
+            const QByteArray &mrl = (source.url().scheme() == QLatin1String("file") ?
+                    "file:/" + mrlEncode (source.url().toLocalFile().toLocal8Bit()) :
+                    source.url().toEncoded());
             switch (how) {
                 case GaplessSwitch:
                     m_stream->gaplessSwitchTo(mrl);
Comment 34 Tijl Coosemans 2010-06-22 14:45:47 UTC
Martin, the patch works for me (UFS2 filesystem, ISO-8859-15 locale). Thank you very much.
Comment 35 Myriam Schweingruber 2010-06-23 12:06:35 UTC
Closing then, thanks for the feedback.
Comment 36 Michel Briand 2011-04-08 11:15:06 UTC
Hello,

on Debian Squeeze, phonon 4.6 (4:4.6.0really4.4.2-1), this bug seems to be NOT FIXED.

Test:

$ locale
LANG=fr_FR@euro
$ ll $myfile
-rw-r--r-- 1 43022725  8 avril 10:28 01_-_Chori-_Kommt_Ihr_Töchter,_helft_mit_Klagen.flac

amarok knows about the file, it's present in its database, any operation involving tags / search / ... works well

amarok sends this url to phonon:

file:///bigdisk/archives_ro/musique/classique/Bach/BWV%20244/01_-_Chori-_Kommt_Ihr_T%C3%B6chter,_helft_mit_Klagen.flac

in Firefox this URL is valid and target file could be "saved as..." without problem

phonon produces this error:

amarok:   [EngineController] [WARNING!] Phonon failed to play this URL. Error:  "11:07:35: input_file: Fichier non trouv?: >file:///bigdisk/archives_ro/musique/classique/Bach/BWV%20244/01_-_Chori-_Kommt_Ihr_T%C3%B6chter,_helft_mit_Klagen.flac<

Clearly its a Phonon problem converting the UTF-8 url into the charset of the filesystem.
Comment 37 Myriam Schweingruber 2011-04-09 15:49:33 UTC
(In reply to comment #36)
> Hello,
> 
> on Debian Squeeze, phonon 4.6 (4:4.6.0really4.4.2-1), this bug seems to be NOT
> FIXED.

Simply because your KDE 4.6.0 version does ship a phonon which is really a 4.4.2, but the bug is fixed in 4.4.3. Please talk to your distribution, KDE 4.6.0 should ship at least 4.4.3