Bug 137824 - Collection Scan does not support unicode filenames
Summary: Collection Scan does not support unicode filenames
Alias: None
Product: amarok
Classification: Applications
Component: Collections/Local (show other bugs)
Version: 1.4.4
Platform: Ubuntu Linux
: NOR normal (vote)
Target Milestone: ---
Assignee: Amarok Developers
: 137195 (view as bug list)
Depends on:
Reported: 2006-11-24 15:27 UTC by Marek
Modified: 2007-01-27 19:16 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:


Note You need to log in before you can comment on or make changes to this bug.
Description Marek 2006-11-24 15:27:18 UTC
Version:           1.4.4 (using KDE KDE 3.5.5)
Installed from:    Ubuntu Packages
OS:                Linux

Collection scan aborts with the following error:

Sorry, the Collection Scan was aborted, since too many problems were encountered.
Advice: A common source for this problem is a broken 'TagLib' package on your computer. Replacing this package may help fixing the issue.
The following files caused problems:

... list of 3 files follows, the 3rd one repeated many times

This bug was introduced lately, I did not have any problems before. Perhaps it's caused by strlen implementation, because the files have their last 2 characters stripped.
Comment 1 Martin Eitzenberger 2006-11-27 08:49:18 UTC
I have the same problem,
Kubuntu Edgy Eft, Amarok 1.4.4
Clean Kubuntu-Install with latest Kubuntu Amarok,
UTF-8 System

Fails on Files with multibyte-chars
Tried to recompile taglib without result
Tried other db-backends (sqlite, mysql) without result

The stripping of the last chars seems to be a result of the wrong lenght caused by the multibyte chars (in my case: german umlauts äöüß) ... something in the multibyte string / encoding handling of amarok or taglib seems broken...
Comment 2 Mark Kretschmann 2006-11-27 11:07:24 UTC
I have this file in my collection:
Comment 3 Marek 2006-11-29 20:52:41 UTC
Seems the problem is caused by one file. I moved the first file from the list to another location outside collection, and the scan went fine.

The file is an .rm movie that can be downloaded here:

Comment 4 Martin Aumueller 2006-12-28 23:29:47 UTC
#3: I downloaded the .rm, but it has no tags. However, it does not the crash the scanner anymore because of this.

#1: Could you please create a .tar archive (for preserving the exact name) containing a file that causes trouble and make that available to us?
Comment 5 Alexandre Oliveira 2006-12-31 22:13:15 UTC
*** Bug 137195 has been marked as a duplicate of this bug. ***
Comment 6 Andrew Ash 2007-01-03 21:25:33 UTC
The same bug has been reported into Ubuntu's launchpad at https://launchpad.net/bugs/72673

It includes several crash reports from different four different users as well.
Comment 7 Mark Kretschmann 2007-01-04 00:23:07 UTC
Andrew, we figured this out a while ago. The scanner crashes sometimes on Ubuntu, because the package was compiled with Stack Smashing Protection. This option is apparently known to be a little trigger happy or buggy.

At any rate, the Stack Smashing Protection sometimes aborts the scanner. The solution is to rebuild Amarok without SSP (which is enabled by default in Ubuntu's GCC).
Comment 8 Andrew Ash 2007-01-27 01:01:50 UTC
Thanks for that info, Mark.  I've filed that bug in launchpad as https://launchpad.net/bugs/81768
Comment 9 Mark Kretschmann 2007-01-27 01:28:59 UTC
Wait, we've just something weird in the scanner; we're investigating. I'll keep you posted.
Comment 10 kat 2007-01-27 02:34:16 UTC
I have the same problem but it takes it one step further. I used a Unicode character in the Artist name field and let Amarok 1.4.4 write the filename. It both wrote the filename incorrectly (not in Unicode) and could not re-read the file it had just written.
Steps to recreate the problem.
(I'm using Xubuntu Edgy Eft, with Amarok 1.4.4, MySQL UTF-8 Database)
1) Add an OGG Vorbis file to the playlist (you can play it to test that it works)
 1a) Right click on the file and choose 'edit track information'
2) Change the Artist field to José Nuñez 
 2a) (if the bug tracking system doesn't show Unicode I've written Jos(e-forward accent) Nu(spanish-nnya [n with a tilde])ez.
3) Right Click on the file and choose File Administration... Organize File
4) I've left the default options as installed on for file organization.
 4a) collection folder /media/nfs/Music
 4b) Use cover art as icon (OFF)
 4c) Ignore 'the' in the artists name (ON)
 4d) File Naming Convention Custom (ON)
 4c) %folder/%filetype/%initial/%albumartist/%album{ (Disc %discnumber)}/{[%track] }%artist - %title.%filetype
 4d) the rest of the options (OFF)
5) Click OK
6) Rightclick on the file in amarok and choose Edit Track Information
7) click on the folder icon next to the full path of the file
8) a file manager opens (doesn't really matter which one, I tried with both Konqueror and Thunar.
 8a) You will see that Amarok has turned the unicode charachters ñ (nnya) and é into diamonds with question-marks or some other illegible gobbledygook.
9) Close the 'edit track information dialog'
10) Play the file
11) Amarok shows a Toaster "File not in collection"
12) It is possible to drag the file from Thunar to Amarok and it will play. AND Amarok shows the correct unicode character in the Artist field. 
13) Yet it is impossible to add the file to the collection or search for the file.
Comment 11 Alexandre Oliveira 2007-01-27 19:16:57 UTC
SVN commit 627686 by aoliveira:

When the scanning logs the file that made it crash, make it write/read proper utf-8 strings. If the file path had some non ascii chars
in it, the scanner wouldn't restart from the right point, causing a pretty bizarre behaviour.