Version: 2.3.0 (using 4.4.2 (KDE 4.4.2), Arch Linux) Compiler: gcc OS: Linux (i686) release 2.6.33-ARCH Since one or two months Amarok is not able to add every track to library. I'm using one folder ~/Music where all music is stored. There are missing randomly tracks in complete albums (theses albums - mostly mp3s - have correct ID tags). When using cmus, Songbird or Exaile (haven't tried other) everything is fine. I'll pick up one example: 21:37:29::$ ls -1 ~/Music/Haggard/Tales\ Of\ Itheria\ \(2008\)/ folder.jpg Haggard - Tales of Ithiria - 01 - The Origin.mp3 Haggard - Tales of Ithiria - 02 - Chapter I - Tales of Ithiria.mp3 Haggard - Tales of Ithiria - 03 - From Deep Within.mp3 Haggard - Tales of Ithiria - 04 - Chapter II - Upon Fallen Autumn Leaves.mp3 Haggard - Tales of Ithiria - 05 - In des Königs Hallen (Allegretto Siciliano).mp3 Haggard - Tales of Ithiria - 06 - Chapter III - La Terra Santa.mp3 Haggard - Tales of Ithiria - 07 - Vor dem Sturme.mp3 Haggard - Tales of Ithiria - 08 - Chapter IV - The Sleeping Child.mp3 Haggard - Tales of Ithiria - 09 - Hijo de la Luna.mp3 Haggard - Tales of Ithiria - 10 - On These Endless Fields.mp3 Haggard - Tales of Ithiria - 11 - Chapter V - The Hidden Sign.mp3 Amarok's database only lists 1,6,7,8,10 When dragging all files using "Files" all mp3s are inserted but look how: http://www.ubuntu-pics.de/bild/56068/131_sz75DX.jpg All mp3s which are not in the database have a greyed metadata section, don't know why. File attributes are the same for every file.
This may be connected to "duplicate directory entries" bug I also came across while using amarok. Could you plese post the results on following query run on your Amarok SQL database? SELECT * FROM `directories` A LEFT JOIN directories B ON B.dir REGEXP CONCAT( A.dir, ".$" ) WHERE A.dir NOT REGEXP '/$' If you are not sure how to do it, plese tell me if you use an embedded mysql server or a standalone one, I'll try to give you appropriate instructions.
Setting status correctly
I am using the internal database and have no clue, where the db is located and how to access it ;)
First, sorry for for such delayed response. The best way to identify your bug will be to investigate your whole database. If you're concerned about privacy (database contains metadata, statistics, lyrics, labels and names of scanned folders), we can figure out other ways to do it. How to dump your Amarok database? Please completely turn off Amarok and run following commands (as a normal user) in a terminal (inspired by section "Migrating from MySQL Embedded to MySQL Server" of [1]): cd ~/.kde4/share/apps/amarok # may have to replace .kde4 with .kde /usr/sbin/mysqld --defaults-file=`pwd`/my.cnf --default-storage-engine=MyISAM --datadir=`pwd`/mysqle --socket=`pwd`/sock --skip-grant-tables & mysqldump -S sock amarok > amarok.mysql mysqladmin -S sock shutdown bzip2 amarok.mysql mv amarok.mysql.bz2 ~/Desktop/ This procedure should leave a amarok.mysql.bz2 file in your desktop directory with your complete Amarok database dump which you can upload somewhere. (it will be probably too big to attach to this bugreport) [1] http://amarok.kde.org/wiki/MySQL_Server
Oh, all commands should be typed on one line, especially the mysqld command, which got accidentally broken into 2 lines.
Created attachment 48230 [details] amarok mysql dump
Thanks for the dump, Martin. The database really has only tracks 1,6,7,8,10, strange. I suppose you've already tried to "fully rescan collection" from within "Configure Amarok" dialogue. This is a strange bug, for additional information please provide output of following commands: ls -la ~/Music/Haggard/Tales\ Of\ Itheria\ \(2008\)/ file ~/Music/Haggard/Tales\ Of\ Itheria\ \(2008\)/* Thank you for your efforts with dealing with this bug.
Hi Matěj, I've tried a full scan several times. Even when I add other music, some tracks still disappear. 09:24:03::$ ls -la ~/Music/Haggard/Tales\ Of\ Itheria\ \(2008\)/ insgesamt 99M drwxr-x--x 2 unix unix 4,0K 17. Mai 2009 ./ drwxr-x--x 5 unix unix 4,0K 23. Apr 2009 ../ -rw-r----- 1 unix unix 7,2K 24. Apr 2009 folder.jpg -rw-r----- 1 unix unix 4,6M 16. Mai 2009 Haggard - Tales of Ithiria - 01 - The Origin.mp3 -rw-r----- 1 unix unix 19M 16. Mai 2009 Haggard - Tales of Ithiria - 02 - Chapter I - Tales of Ithiria.mp3 -rw-r----- 1 unix unix 1,1M 16. Mai 2009 Haggard - Tales of Ithiria - 03 - From Deep Within.mp3 -rw-r----- 1 unix unix 16M 16. Mai 2009 Haggard - Tales of Ithiria - 04 - Chapter II - Upon Fallen Autumn Leaves.mp3 -rw-r----- 1 unix unix 4,8M 8. Feb 2009 Haggard - Tales of Ithiria - 05 - In des Königs Hallen (Allegretto Siciliano).mp3 -rw-r----- 1 unix unix 12M 8. Feb 2009 Haggard - Tales of Ithiria - 06 - Chapter III - La Terra Santa.mp3 -rw-r----- 1 unix unix 1,4M 8. Feb 2009 Haggard - Tales of Ithiria - 07 - Vor dem Sturme.mp3 -rw-r----- 1 unix unix 15M 8. Feb 2009 Haggard - Tales of Ithiria - 08 - Chapter IV - The Sleeping Child.mp3 -rw-r----- 1 unix unix 10M 8. Feb 2009 Haggard - Tales of Ithiria - 09 - Hijo de la Luna.mp3 -rw-r----- 1 unix unix 2,5M 8. Feb 2009 Haggard - Tales of Ithiria - 10 - On These Endless Fields.mp3 -rw-r----- 1 unix unix 15M 8. Feb 2009 Haggard - Tales of Ithiria - 11 - Chapter V - The Hidden Sign.mp3 09:25:44::$ file ~/Music/Haggard/Tales\ Of\ Itheria\ \(2008\)/* /home/unix/Music/Haggard/Tales Of Itheria (2008)/folder.jpg: JPEG image data, JFIF standard 1.01 /home/unix/Music/Haggard/Tales Of Itheria (2008)/Haggard - Tales of Ithiria - 01 - The Origin.mp3: Audio file with ID3 version 2.4.0, contains: MPEG ADTS, layer III, v1, 320 kbps, 44.1 kHz, Stereo /home/unix/Music/Haggard/Tales Of Itheria (2008)/Haggard - Tales of Ithiria - 02 - Chapter I - Tales of Ithiria.mp3: Audio file with ID3 version 2.4.0, contains: MPEG ADTS, layer III, v1, 320 kbps, 44.1 kHz, Stereo /home/unix/Music/Haggard/Tales Of Itheria (2008)/Haggard - Tales of Ithiria - 03 - From Deep Within.mp3: Audio file with ID3 version 2.4.0, contains: MPEG ADTS, layer III, v1, 320 kbps, 44.1 kHz, Stereo /home/unix/Music/Haggard/Tales Of Itheria (2008)/Haggard - Tales of Ithiria - 04 - Chapter II - Upon Fallen Autumn Leaves.mp3: Audio file with ID3 version 2.4.0, contains: MPEG ADTS, layer III, v1, 320 kbps, 44.1 kHz, Stereo /home/unix/Music/Haggard/Tales Of Itheria (2008)/Haggard - Tales of Ithiria - 05 - In des Königs Hallen (Allegretto Siciliano).mp3: Audio file with ID3 version 2.3.0, contains: MPEG ADTS, layer III, v1, 320 kbps, 44.1 kHz, Stereo /home/unix/Music/Haggard/Tales Of Itheria (2008)/Haggard - Tales of Ithiria - 06 - Chapter III - La Terra Santa.mp3: Audio file with ID3 version 2.3.0, contains: MPEG ADTS, layer III, v1, 320 kbps, 44.1 kHz, Stereo /home/unix/Music/Haggard/Tales Of Itheria (2008)/Haggard - Tales of Ithiria - 07 - Vor dem Sturme.mp3: Audio file with ID3 version 2.3.0, contains: MPEG ADTS, layer III, v1, 320 kbps, 44.1 kHz, Stereo /home/unix/Music/Haggard/Tales Of Itheria (2008)/Haggard - Tales of Ithiria - 08 - Chapter IV - The Sleeping Child.mp3: Audio file with ID3 version 2.3.0, contains: MPEG ADTS, layer III, v1, 320 kbps, 44.1 kHz, Stereo /home/unix/Music/Haggard/Tales Of Itheria (2008)/Haggard - Tales of Ithiria - 09 - Hijo de la Luna.mp3: Audio file with ID3 version 2.3.0, contains: MPEG ADTS, layer III, v1, 320 kbps, 44.1 kHz, Stereo /home/unix/Music/Haggard/Tales Of Itheria (2008)/Haggard - Tales of Ithiria - 10 - On These Endless Fields.mp3: Audio file with ID3 version 2.3.0, contains: MPEG ADTS, layer III, v1, 320 kbps, 44.1 kHz, Stereo /home/unix/Music/Haggard/Tales Of Itheria (2008)/Haggard - Tales of Ithiria - 11 - Chapter V - The Hidden Sign.mp3: Audio file with ID3 version 2.3.0, contains: MPEG ADTS, layer III, v1, 320 kbps, 44.1 kHz, Stereo
Maybe http://paste.ubuntu.com/453783/ is better
Martin, did you try setting your LOCALE encoding to UTF-8 or UTF-16 and retag all your music to UTF encoding? Esaytag allows to do that easily. Could be an unrecognized character. ISO formatted files should be avoided.
At least the tags from the file listing above have correct encodings (utf-8), but changing it to utf-16 makes no different. Meanwhile, the next full collection rescans made some more mistakes. For example, splitting an album into two visual different parts (see http://www.ubuntu-pics.de/bild/89285/bildschirmfoto2_8GR35A.png) and an album disappears, but one is a duplicate. And in addition there are much more albums with missing tracks. When I use one more folder as collection folder, I get 10% duplicate artists. Really weird. I'm in the opininion it is a bug of amarok, not of my music. I've tagged all my music using first MusicBrainz Picard and then Kid3. Amarok 1.4 (tm) had no problem with this collection nor cmus, Exaile and so on... Thanks for your support by the way.
Martin, how about upgrading to the latest stable version 2.3.1? Also making a backup of your current database and creating a new one might give some indications. Running amarokcollectionscanner in a console also gives you an output where you can detect eventual errors (see amarokcollectionscanner --help for the available options).
I am already using the latest amarok. Today I made a total cleanup of amarok's config files and folders and did a complete restart of amarok - debugging turned on. When doing a scan using amarokcollectionscanner, all files are recognized, but there are some uuids missing: http://pastebin.com/QgVHukvE When you take a look at the amarok debug log, you will see that, all missing tracks (of Haggard) are listed: http://paste.ubuntu.com/454348/
Hm, those tracks have no unique identifiers and are therefore not scanned. Does touching the folder change something? Maybe renaming the folder slightly could help, e.g. removing some spaces.
I think this is clarly a bug in collectionscanner. Now we have to find out why it doesn't generate uniqueids. Or maybe amarok should cope with missing uniqueids and don't treat them as uniqueids with value ""? Martin, I fear that we wouldn't be able to reproduce unless we had the same files (at least) one as you. But sharing these may be a legal problem. Is there any free (as in beer) and legal way for us to get at least one of these mp3s that collectionscanner fails to generate uniqueid for?
@Myriam, moving the folder to a folder without spaces makes no difference :-/ I have sent a sample file to Matěj, I don't know a simple but legal way to share the affected files. Just I did a grep on the amarokcollectionscanner to find out, how many files have no uniqueid: 189 out of nearly 5500 (3,4%). But quite randomly; never a whole album is affected.
Thanks, Martin. I can confirm the collectionscanner fails to add uniqueid to track 2, while it adds it to track 1 properly on my box. I'm now investigating the collectionscanner code. The mp3s look fine, no problems found by utils such as MP3Diags.
Thank you for the feedback.
Martin, I got it! The problem is that some of your files contain musicbrainz uniqueid (in UFID id3v2 frame) that contains non-printable characters. Amarok blindly uses this uniqueid only to drop it when generating resulting xml. (note that I still consider it a bug in amarok, not your files) I've created a patch and published it in my personal clone on gitorious, it's the last commit in collectionscanner-fixes branch [1]. The extensive bug description is in the commit message, the summary is that it makes amarok more robust when dealing with embedded uniqueids. [1] http://gitorious.org/~strohel/amarok/strohels-amarok/commits/collectionscanner-fixes As a workaround, you can run amarok_afttagger on offending files or your whole music folder (+ full collection rescan afterwars), but please do keep some of the affected files to be able to test the patch in future. Myriam, how to submit patches now that amarok's moved out of gitorious? Should I wait for a reviewboard or stick the patch here?
Matej, for now please attach the patch to this bug report, marked as Patch. Once ReviewBoard is set up for Amarok we will notify on the mailing list about the new proceedings. Thanks a lot for sorting this out :)
Created attachment 48294 [details] aftutility-ignore-embedded-uniqueids-with-invalid-chars.patch This patch should fix the problem. The name of the new function (currently stringHasUncategorisedChars()) is something I'm not happy with, otherwise it should be okay.
(In reply to comment #19) > id3v2 frame) that contains non-printable characters. Amarok blindly uses this > uniqueid only to drop it when generating resulting xml. (note that I still > consider it a bug in amarok, not your files) Please do explain how Amarok having difficulty with corrupted files is a "bug in amarok, not your files". That doesn't mean we shouldn't work around it. But, put blame where it's due.
(In reply to comment #22) > (In reply to comment #19) > > id3v2 frame) that contains non-printable characters. Amarok blindly uses this > > uniqueid only to drop it when generating resulting xml. (note that I still > > consider it a bug in amarok, not your files) > > Please do explain how Amarok having difficulty with corrupted files is a "bug > in amarok, not your files". > > That doesn't mean we shouldn't work around it. But, put blame where it's due. Alright. I pretty much incorrectly used the word "corrupt files" to describe files with non-printable characters in UFID (musicbrainz-like) id3v3 frame [1]. However, these frames are perfectly valid according to id3v2 spec [2] which states that UFID is "<text string> followed by null byte followed by <up to 64 bytes binary data>". Amarok imposes internal limitation on contents of uniqueid (which is okay), see utilities/collectionscanner/CollectionScanner.cpp:836. But I think the limitation is applied incorrectly which results in the problem described in this bug. Technical description how it happens is in the patch itself. Jeff, I don't want to discredit amarok in any way, rather the otherwise: I love amarok and made this patch so it can be even greater than it is now. [1] The frame's exact contents in one such file are: http://musicbrainz.org{\0x00}3c9dc66d-2121-4307-a736-0589e66a76827{\0x19}{\0x4a}{\0x02}{\0x10} where {\0xNN} denotes character wich hex value NN. [2] http://www.id3.org/id3v2.3.0#head-86fc458c0f12976c6866d12610551218827dca80
> However, these frames are perfectly valid according to id3v2 spec [2] which > states that UFID is "<text string> followed by null byte followed by <up to 64 > bytes binary data>". > > Amarok imposes internal limitation on contents of uniqueid (which is okay), see > utilities/collectionscanner/CollectionScanner.cpp:836. But I think the > limitation is applied incorrectly It isn't, because only two types of UFIDs are ever used by Amarok -- AFT UFIDs that it generates itself and are guaranteed to always have printable ASCII characters, and MusicBrainz IDs, which after doing some looking around I can't find any sort of an official spec saying what they are comprised of (other than "ASCII") but having seen a few tens of thousands of these in my time I've always seen as being in the same [0-9a-f] range. This makes sense too -- these are designed to be displayed to users and for users to be able to reference to compare tracks, and to copy and paste into the website, etc. -- there is no sense in having nonprintable characters in a user-facing value. Is it possible for you to attach (or privately email me a link to) one such file? I'd like to try to figure out if this is a problem in some version of picard, or in the MB database itself. Regardless, the patch can't work as-is. escape() is used for all string values, not just the UFID. Just put in a simple check in AFTUtility that ensures that a passed-in value contains only [0-9a-f-] (note the hyphen), and add the check to lines like: if( !storedMBId.isEmpty() && ( storedMBId != mbDefaultUUID ) ) Also, the commit message is overlong and can really be boiled down to "Ignore MBIDs containing non-hex characters."
> Regardless, the patch can't work as-is. escape() is used for all string values, > not just the UFID. I thing escape() is needed nowhere (it has been used only for the check), because the result of the check is guaranteed to be tha same for escaped and unescaped string. Regardless I won't touch CollectionScanner.cpp in reworked patch anyway. > Just put in a simple check in AFTUtility that ensures that a > passed-in value contains only [0-9a-f-] (note the hyphen), and add the check to > lines like: > > if( !storedMBId.isEmpty() && ( storedMBId != mbDefaultUUID ) ) > > Also, the commit message is overlong and can really be boiled down to "Ignore > MBIDs containing non-hex characters." Okay, I will rework it. Should I add a check also to amarok-generated UUIDs? Otherwise it would be theoretically possible that (unprobable) disk corruption (or afttagger malfunction) would render the file inaccesible by collectionscanner.
Created attachment 48407 [details] aftutility-ignore-mbids-containing-non-hex-characters.patch Reworked minimal patch.
FYI, I just asked in #musicbrainz -- MBIDs are UUIDs -- see http://en.wikipedia.org/wiki/UUID#Version_4_.28random.29 So, they should indeed always be hex characters, and any MBIDs with nonprintable characters are somehow corrupt.
(In reply to comment #27) > FYI, I just asked in #musicbrainz -- MBIDs are UUIDs -- see > http://en.wikipedia.org/wiki/UUID#Version_4_.28random.29 > > So, they should indeed always be hex characters, and any MBIDs with > nonprintable characters are somehow corrupt. Okay, thanks for digging it out. What about the new patch?
commit 318427e502208349d6279584c3b653379446028d Author: Jeff Mitchell <mitchell@kde.org> Date: Thu Jul 1 16:29:15 2010 -0400 Ignore embedded IDs containing non-hex characters. Fixes files not being scanned if they had corrupted MBID or AFT IDs. Patch by MatÄj Laitl <matej@laitl.cz>. BUG: 236227 diff --git a/ChangeLog b/ChangeLog index 73d15df..8babebb 100644 --- a/ChangeLog +++ b/ChangeLog @@ -32,6 +32,8 @@ VERSION 2.3.2-Beta 1 Patch by Richard Longland <rlongland@hotmail.com>. BUGFIXES: + * Fixed some tracks not being scanned when they had corrupted MusicBrainz + IDs. Thanks to MatÄj Laitl <matej@laitl.cz> for the patch. (BR 236227) * Fixed crash when navigating using "Places" in the file browser. (BR 240338) * Fixed error dialog popup if the stored directory is no longer accessible when using the file browser. (BR 234286) diff --git a/utilities/collectionscanner/AFTUtility.cpp b/utilities/collectionscanner/AFTUtility.cpp index 22d3a28..76b91c6 100644 --- a/utilities/collectionscanner/AFTUtility.cpp +++ b/utilities/collectionscanner/AFTUtility.cpp @@ -39,6 +39,7 @@ #include <xiphcomment.h> #include <QFile> +#include <QRegExp> #include <QTime> #define Qt4QStringToTString(s) TagLib::String(s.toUtf8().data(), TagLib::String::UTF8) @@ -103,6 +104,9 @@ AFTUtility::readEmbeddedUniqueId( const TagLib::FileRef &fileref ) QString mbId = QString( "http://musicbrainz.org" ); QString storedMBId; QString mbDefaultUUID = QString( "[mb track uuid]" ); + QRegExp aftPattern( "[a-f0-9]+", Qt::CaseInsensitive ); + QRegExp mbIdPattern( "[-a-f0-9]+", Qt::CaseInsensitive ); + if( TagLib::MPEG::File *file = dynamic_cast<TagLib::MPEG::File *>( fileref.file() ) ) { if( !file->ID3v2Tag( false ) ) @@ -118,12 +122,19 @@ AFTUtility::readEmbeddedUniqueId( const TagLib::FileRef &fileref ) { QString owner = TStringToQString( currFrame->owner() ); if( owner.compare( ourId, Qt::CaseInsensitive ) == 0 ) - return TStringToQString( TagLib::String( currFrame->identifier() ) ).toLower(); + { + QString identifier = TStringToQString( TagLib::String( currFrame->identifier() ) ).toLower(); + if( aftPattern.exactMatch( identifier ) ) + return identifier; + else + return QString(); + } else if( owner.compare( mbId, Qt::CaseInsensitive ) == 0 ) storedMBId = TStringToQString( TagLib::String( currFrame->identifier() ) ).toLower(); } } - if( !storedMBId.isEmpty() && ( storedMBId != mbDefaultUUID ) ) + if( !storedMBId.isEmpty() && ( storedMBId != mbDefaultUUID ) && + mbIdPattern.exactMatch( storedMBId ) ) return QString( "mb-" ) + storedMBId; } //from here below assumes a file with a XiphComment; put non-conforming formats up above... @@ -148,12 +159,16 @@ AFTUtility::readEmbeddedUniqueId( const TagLib::FileRef &fileref ) if( comment->contains( Qt4QStringToTString( ourId.toUpper() ) ) ) { QString identifier = TStringToQString( comment->fieldListMap()[Qt4QStringToTString(ourId.toUpper())].front()).toLower(); - return identifier; + if( aftPattern.exactMatch( identifier ) ) + return identifier; + else + return QString(); } else if( comment->contains( Qt4QStringToTString( mbId.toUpper() ) ) ) { QString identifier = TStringToQString( comment->fieldListMap()[Qt4QStringToTString(mbId.toUpper())].front()).toLower(); - if( !identifier.isEmpty() && ( identifier != mbDefaultUUID ) ) + if( !identifier.isEmpty() && ( identifier != mbDefaultUUID ) && + mbIdPattern.exactMatch( storedMBId ) ) return QString( "mb-" ) + identifier; }