Bug 99791

Summary: amarok fails to recognize moved folders
Product: [Applications] amarok Reporter: Rodney Gordon II <meff>
Component: generalAssignee: Amarok Developers <amarok-bugs-dist>
Status: RESOLVED FIXED    
Severity: wishlist CC: ana, dato, frido.roose, magnus, simon.roby
Priority: NOR    
Version: 1.2   
Target Milestone: ---   
Platform: unspecified   
OS: Linux   
Latest Commit: Version Fixed In:

Description Rodney Gordon II 2005-02-19 15:53:12 UTC
Version:           1.2 (using KDE 3.3.2,  (3.1))
Compiler:          gcc version 3.3.5 (Debian 1:3.3.5-8)
OS:                Linux (i686) release 2.6.10-ck5-meff1

After moving a folder, I have to do 'Rescan Collection' to get amaroK to recognize if I have moved or renamed a folder.

The 'Monitor folders for change' option IS checked on, and this behaviour seemed to start in 1.2, IIRC 1.1 worked file.
Comment 1 Rodney Gordon II 2005-02-19 15:54:36 UTC
file=fine, sorry for typo.
Comment 2 Alexandre Oliveira 2005-08-14 21:25:55 UTC
*** Bug 110774 has been marked as a duplicate of this bug. ***
Comment 3 der Graph 2005-08-17 18:20:41 UTC
*** This bug has been confirmed by popular vote. ***
Comment 4 Frido Roose 2005-08-17 20:55:43 UTC
Like some people already have mentioned, moving files and directories (and probably renaming files too) result in loss of scoring info. Statistics/scores are only meaningful over time, and sometimes, people have to move their collection meanwhile to other, bigger devices, etc.  Or people use removable storage, which can have different mount points with auto mounts, etc...

The statistics table works with a full path (url) as unique key, which causes these kind of problems.  The problem is that you have to have something unique to identify your songs.  I know the xmms-imms project does handle this correctly, by using partial checksums of songs.  ID3v1 tags are appended to an mp3 file, while ID3v2 tags are prepended.  Changing that information wouldn't influence that 'partial' checksum.

A drawback is that it initially takes some time to fill the database since every checksum has to be calculated.  Maybe an option to "enable" or "disable" statistics with a default of "disabled" could be useful for people who aren't interested in stats anyway.  I think people who are interested in stats would prefer this behaviour over losing them.

Or maybe there are other more suitable solutions too...
Comment 5 Alexandre Oliveira 2005-08-17 21:35:35 UTC
*** Bug 105577 has been marked as a duplicate of this bug. ***
Comment 6 der Graph 2005-08-18 00:40:09 UTC
Such a checksum could be a nice idea, as it may also help to identify duplicates. Additionally, that checksum could be calculated during playback with minimal overhead.

However, the algorithm used to generate the checksum is crucial. If, for example, only the dominating frequency in each frame is used for checksum calcualation, tools like mp3gain (which losslessly adjusts the loudness of mp3 files => Replay Gain) wouldn't change the checksum. Alternatively, the MusicBrainz TRM (which e.g. the MBTagger stores in a special ID3V2 frame), or the checksum submitted to the TRM server could be used.
Comment 7 Vincent Panel 2005-08-20 02:09:55 UTC
And what about the trm (used by musicbrainz) instead of checksum ?
Comment 8 Alexandre Oliveira 2005-08-20 02:37:39 UTC
Creating the collection already takes lots of time, and we really can't start indexing the files using something slow to generate, as a big checksum or the trm.
Some operations, like manually setting the score, can create the statistic information for songs that have never been played, so we can't calculate the index only at play time, though yes, it could be made "on demand".
Unless something fast (not much slower than reading the tags, maybe it could use only few seconds from differents parts of the song), safe, and good (returning few collisions) can be done, it's not really a good option. Besides, we'd need to handle collisions.
Comment 9 Frido Roose 2005-08-22 22:53:26 UTC
Initializing the database would be the most resource consuming part, since you have a lot of checksum calculations + inserts in your database.  If initialisation is already required in the first place, as you can just add them to the stats table at the end of the song when it seems there is no record yet.  Stats are generated on the moment you skip or stop the track (i suppose).  If you analyse/calculate while the song is playing, would you even notice the overhead?

Ok, for manually editing scores, it would take some more resources...  but if scores are not reliable, there is no real point in editing them anyway since you could lose them again.

trm may not be the most resource-friendly way of analyzing... maybe another, more simple, (partly?) checksum of the song is more efficient (although it may not recognize different versions of a same song as one... but it would already be a step forward in persistency).

Comment 10 Ian Monroe 2005-08-23 03:22:02 UTC
Unless you got some great super checksum idea, its not an option. Checksums are slow.

I'm beginning to see Apple's wisdom in throwing junk in the COMMENT field. :| 

If it was done at the taglib level, then other taglib apps would have it hidden from them. Perhaps there is another field that could be used.
Comment 11 der Graph 2005-08-23 11:09:14 UTC
I don't think writing extra info into the meda file is a good option. Not only because of read-only files, but if every media player did...

A rather minimalistic way to identify media files would be to store only the play time or the number of frames and the bitrate. Sure, there would be hundreds of collissions, but since this system is not for identifying duplicates, but to track moving / renaming files, collissions could be ignored until a file which should be in the collection is missing. Later, if some files are missing, and some new files are found, compare the new files with what you've stored about the old ones. If the match is below a given limit, ask the user.

Sure, it would be nice if you had some system to perfectly identify the media files, or to find duplicates (I'll have to work some days on my collection to get rid of all those duplicates). But this might be done in the future. And if we can keep track of moved files without any kind of checksums, and if checksum calculation takes too long, why do we discuss checksums?
Comment 12 Stefan Siegel 2005-08-23 13:47:40 UTC
Could it be an option to use device and inode numbers to track simple moving/renaming/tagging operations?
Comment 13 der Graph 2005-08-23 14:04:08 UTC
It's a good idea to store them in addition, but I woulnd't rely only on them. For example, I have some media files encoded in crappy quality, and sometimes I replace them with re-encoded files. Also, there are some programs which don't simply alter the file, but create a modified copy and then rename it. In both cases it's still the same song, and should have the same statistics.
Comment 14 Mark Kretschmann 2005-08-23 14:37:13 UTC
On Tuesday 23 August 2005 13:47, Stefan Siegel wrote:
> Could it be an option to use device and inode numbers to track simple
> moving/renaming/tagging operations?


I doubt that's portable to Windows or MacOS.
Comment 15 Mike Diehl 2005-08-28 18:19:01 UTC
Checksum is not an option here imho. Doing a checksum against my collection could take days. For instance.

┌─(mike@SledgeHammer)(01:16:34)
└─(~)-> dcop amarok collection query "SELECT COUNT( url )FROM tags;"
19945

Do you really think that doing a checksum against my collection would be an enjoyable experience? Maybe we just need a migration operation in the collection browser which changes the path for tracks automagically on moving files. 
Comment 16 Frido Roose 2005-08-28 18:34:32 UTC
That's the whole point... how will you know a file has moved?  How will amaroK identify it as being the old file and keep the scores?  It won't, unless you "move" them from within amarok.  But what about tagger applications that mass rename/move files, you wouldn't be able to use them anymore, or you'll loose statistics.  You can't build the whole tagging stuff into amaroK neither.
Comment 17 Mark Kretschmann 2005-08-29 08:51:29 UTC
On Sunday 28 August 2005 18:19, Mike Diehl wrote:
> Maybe we just need a migration operation in the
> collection browser which changes the path for tracks automagically on
> moving files.


That's a cool idea. Simple and effective.
Comment 18 Roland 2005-09-06 23:19:26 UTC
to move files an anything else, without loosing the databaesentrays for every file, maybe a filemanager in amarok to move files and change the entrys in the database.
Comment 19 Alexandre Oliveira 2005-09-07 06:19:31 UTC
*** Bug 101043 has been marked as a duplicate of this bug. ***
Comment 20 der Graph 2005-09-07 09:08:22 UTC
I wouldn't consider a filemanager an option as well. After all, Linux still is a multi-user platform. But if oyu use amaroK for moving files, only one users statistics will be saved.

I still believe that the most efficient and easiest way would be to gather extra info, track deleted or not mounted files as well as previously unrecognized ones, and whe in doubt... query.
Comment 21 Mike Diehl 2005-10-24 04:34:02 UTC
SVN commit 473565 by mdiehl:

Renaming selected files by tags in collection browser. Delete selected files from collection browser. I have tested on 20k files,
but test this only if you have a backup in case your collection is moved to the bit bucket :). Renaming includes cover art for folder icons
support. New dcop call collection moveFile( oldURL, newURL ). Oh yeah, it keeps stats intact too.

CCBUG: 104448
CCBUG: 99791
CCBUG: 93915
CCBUG: 75211


 M  +5 -0      amarokcore/amarokdcophandler.cpp  
 M  +1 -0      amarokcore/amarokdcophandler.h  
 M  +1 -0      amarokcore/amarokdcopiface.h  
 M  +165 -3    collectionbrowser.cpp  
 M  +5 -1      collectionbrowser.h  
 M  +44 -1     collectiondb.cpp  
 M  +1 -0      collectiondb.h  
Comment 22 Tristan Miller 2005-11-25 08:41:56 UTC
Don't know if this has been suggested already, but it would great if I could just right-click on a specific folder and select "Rescan this folder".  Then I wouldn't have to wait several minutes for amaroK to rescan my entire collection.
Comment 23 Mark Kretschmann 2006-03-08 13:52:32 UTC
In 1.4, there's a sophisticated "Manage Collection" tool, which allows to shuffle stuff around without losing statistics.
Comment 24 der Graph 2006-03-08 16:35:51 UTC
In my very humble opinion, this is not a solution to the problem. Especially because Linux is a multi-user system, and this is a single-user approach. If two users share the same collection, only one of them can keep her statistics. Also it's nice to be _able_ to manage the collection with amaroK, but it's bad to be _forced_ to do so.
Comment 25 Martin Aumueller 2006-03-08 16:47:02 UTC
So you are suggesting to attach the rating information to the track metadata? If you do that, then the ratings for other users would break as soon as the tags of a song are edited.
Comment 26 der Graph 2006-03-08 17:13:00 UTC
No, I don't! How would you do that on a mulit-user system anyway?

However, it would be an option to set a (unique ?) identifier tag, or use the MusicBrainz ArtistID, AlbumID and TrackID tags if they are already stored. That's pretty much like the checksum approach discussed above, but without the need to read the entire file.
Comment 27 richlv 2006-04-20 15:01:48 UTC
i'd be ready to give amarok a couple of days to checksum all my files (~18k) if i would get permanent stats with an ability to shuffle the files around.

that's not a feature that should be enabled by default, but if somebody really wants to keep things in order, enabling that wouldn't be the biggest sacrifice.
anyway, any other solution would result in a lot of conflicts that users would not be happy to manage manually (or a very sophisticated interface would be needed to help them doing it)
Comment 28 Tristan Miller 2006-04-20 15:48:20 UTC
I think people here are vastly overestimating the time it takes to generate a hash or checksum of a file.  For an experiment, I tried generating SHA1 hashes of the first 1000 songs in my collection.  I'm running a 1.6 GHz Intel Centrino, and all the songs are on a network drive on another computer in our office LAN.  The results?

$ date;find . -type f -name '*.ogg' | head -1000 | tr '\n' '\0' | xargs -0 sha1 > /dev/null;date
Thu Apr 20 14:36:09 BST 2006
Thu Apr 20 14:42:25 BST 2006

So it processed 1000 files in about six minutes.  Unless people have hundreds of thousands or millions of music files, nobody is going to have to wait "a couple of days" for amaroK to build a hash database.
Comment 29 Ian Monroe 2006-04-20 18:58:30 UTC
6 minutes for 1000 files means 42 minutes for my collection. Thats just too long. 

Jeff is currently working on a solution using a randomly generated id to be written to appropriate fields in the music file. 
Comment 30 Tristan Miller 2006-04-21 03:51:18 UTC
Ian (and Jeff), writing to the music file is *not* appropriate, as many others have pointed out in this discussion.  You cannot assume that you have write access to the files, as they may be stored on read-only media such as a CD-ROM, or in a shared folder belonging to another user.  Besides, media files are not the place to store application-specific data.  What if every media player out there decided to insert its own crap into mp3 and Ogg files?  Imagine a networked multiuser system with a writable drive dedicated to shared media, and a hundred users all using different media players.  A lot of junk is going to end up in each file.  

You also overlook the fact that my experiment was nothing more than a proof of concept.  There is no reason we would have to a complicated cryptographic hash function such as SHA-1.  There are plenty of much faster algorithms -- see, for example, Paul Hsieh's SuperFast hash algorithm <http://www.azillionmonkeys.com/qed/hash.html>, which is implemented in Safari and possibly already in Konqueror.
Comment 31 richlv 2006-04-21 11:05:28 UTC
besides, it would more or less be required only once for all the files. and, as already suggested, the checksums could be generated only when the file is played (thus it's statistics changed) - or it could be an extremly-low-priority background task that would index whole collection.
and it would be off by default thus only people who really need it could enable it and cope with the time required to efficiently manage their collections.
Comment 32 Ian Monroe 2006-04-21 21:40:56 UTC
...and then a tag is edited and the hash is changed.

We're using the ID3v2 standard for uniqueid. Regardless, I think your worst-case of a networked music drive with hundred different media players all accessing it isn't very realistic.  Where are all these different varieties of music player coming from? :)

You can refer to the amarok-devel list archive, this has already been discussed and implemented.
Comment 33 der Graph 2006-04-23 22:04:44 UTC
Pardon, but the comment about a tag being edited and the hash changing is just stupid. Whoever would implement something like this should be smart enough to calculate a hash of the audio frames only. And these do rarely change.

Still the hash would have to be calculated for the whole collection, or actually for all files with statistics or new files. Otherwise, if you calculate it only on playback, you will not be able to find the moved file until after playing it from the start to the end without seeking.

But what good is creating a pseudo-unique ID if the tags alredy have a unique ID? All of the music files in my collection already have a MusicBrainz Track ID, which definitely is unique. Or if there are two files with the same TrackID, one of them is a duplicate, and I most likely won't want to keep it in my collection. So why don't you honour the existing data first and only add another tag if it is necessary?

From my POV the optimal solution would be:
1.) check for existing unique identifiers, like the MBTrackID
2.) if write priviledges are granted, add a uniqueid-tag (optional!)
3.) calculate a hash in a low-priority fork
Comment 34 Simon Roby 2006-06-25 22:49:46 UTC
Why is this marked Fixed? It's obviously not.