Bug 201101 - Fix sorting of file names
Summary: Fix sorting of file names
Status: RESOLVED FIXED
Alias: None
Product: dolphin
Classification: Applications
Component: general (show other bugs)
Version: 16.12.2
Platform: unspecified Unspecified
: NOR wishlist
Target Milestone: ---
Assignee: Peter Penz
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-07-22 13:50 UTC by markuss
Modified: 2010-02-21 18:18 UTC (History)
5 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description markuss 2009-07-22 13:50:58 UTC
Version:            (using KDE 4.2.96)

Create two files:
"text.txt" and "text2.txt".

You'll notice that Dolphin will put text2.txt before text.txt which may be the correct order my machine standards, but certainly by human expectations.
Comment 1 Peter Paulsen 2009-08-27 12:07:25 UTC
German language? I don't know who got the idea that Germans want to sort their files that way, it's just a pain to find files with "German" sorting.
Comment 2 Peter Penz 2009-08-27 12:13:31 UTC

*** This bug has been marked as a duplicate of bug 169883 ***
Comment 3 Peter Penz 2009-09-01 08:15:03 UTC
(internal note, as it has been accidentally marked as duplicate to bug 169883: posix ordering is also text2.txt, text.txt, so this wish wants the natural sorting extended to handle "no number" as 0 internally)
Comment 4 markuss 2009-09-01 10:13:09 UTC
Thanks for reopening this bug
I just noticed that Dolphin currently also sorts

text 2.txt
text.txt

(see the space there)

So this request is for both

text.txt
text2.txt

 and

text.txt
text 2.txt

Is this somewhat easy to implement? I hope with the newly implemented feature to disable natural sorting, Peter Paulsen's objections no longer apply.
Comment 5 Peter Penz 2009-09-01 12:49:44 UTC
> Is this somewhat easy to implement?

No, it is not easy to get this right + fast... :-( On the other hand fixing this issue might also hide the "wrong locale sorting" issue of Qt, where special characters like -, #, ... or not considered for sorting. I doubt that the "wrong locale sorting" issue in Qt will be fixed (the real root cause is in glibc and the maintainers claim that they are "standard conform" with ignoring the characters).
Comment 6 markuss 2009-09-01 14:13:56 UTC
How about a directory/file sorting cache? Or (as "beta test") a hidden setting via config file editing? Maybe it turns out to be not a huge issue in real world scenarios where >1000 files inside a directory are seldom encountered.
Comment 7 Janet 2009-09-02 01:44:11 UTC
I don't understand all that "new" sorting... For me it is "natural" that  "nothing" (= no letter, no cypher, no sign, just the file ending) is less than the cypher zero and less than anything else so e.g. folder.png always should be before any other foldername like folder1.png or folder-1.png because there is nothing behind the last r, it *must* be the first file beginning with "folder", because the name has 5 letters and all other files beginning with the same sequence of letters "folder" have more chars. 

Is it possible to have this combined with no natural sorting of numbers and combined with ignoring of the file type extension when sorting and ignoring of language specific sorting? Just kind of "telephone book" sorting?
Comment 8 markuss 2009-09-02 08:58:47 UTC
What is language-specific sorting? Roman numbers?
Comment 9 Peter Penz 2009-09-02 11:23:10 UTC
@Markus:
> What is language-specific sorting? Roman numbers?

It means that e. g. the german 'ü' is after 'u' and not after 'z'. This depends also on the used language and is handled in Qt.

@Janet: There is no official definition of "natural sorting" AFAIK. The discussions in the "sorting bug reports" at bugs.kde.org talk about "natural sorting" as defined by the following algorithm: http://sourcefrog.net/projects/natsort

The input from Markus is a valid request, but has nothing to do with the "natural sorting" as defined by the http://sourcefrog.net/projects/natsort
Comment 10 markuss 2009-09-02 12:01:51 UTC
(In reply to comment #5)
> the real root cause is in
> glibc and the maintainers claim that they are "standard conform" with ignoring
> the characters).

Have you tried contacting the eglibc authors, instead of plain glibc? Debian decided to switch to eglibc for various reasons, one being that attitude by the glibc maintainer. With Debian switching, all derivates (incl Ubuntu) will switch, too.
What about other platforms? FreeBSD, Windows, etc. don't use glibc. While I use Linux, my request is not limited to Linux only. :-)


(In reply to comment #9)

> It means that e. g. the german 'ü' is after 'u' and not after 'z'.
> This depends also on the used language

That sounds just wrong. Why should the Romanian letter ă be sorted after z, but ä after a, just because I use German and not Romanian locale setting? An a with any diacritic sign should be located among a, not after z.
I really don't think that Japanese users see that any different.
Comment 11 Peter Penz 2009-09-02 14:03:50 UTC
> Have you tried contacting the eglibc authors, instead of plain glibc?

This is in the responsibility of Qt Software, there is nothing I can do on this topic.

> What about other platforms? FreeBSD, Windows, etc. don't
> use glibc. While I use Linux, my request is not limited to Linux only. :-)

AFAIK Qt only uses glibc for this task on Linux, so on other platforms this issue does not occur.

> That sounds just wrong.

No, this is correct. See http://msdn.microsoft.com/en-us/library/aa292178%28VS.71%29.aspx (section "String Sorting and Comparison") or http://stackoverflow.com/questions/127913/sorting-strings-is-much-harder-than-you-thought.

Example from link #2: "In German, 'Ä' often comes at the beginning of the alphabet, whereas in Swedish it's towards the end."
Comment 12 Janet 2009-09-03 00:40:37 UTC
> The input from Markus is a valid request,

and a very good one, I completely agree with: text.txt should always be before any other text something.txt! That's what for me is "natural": you have a four letter word and a file extension. When you have "text .txt" you have a five "letter" word starting with the same four letters so it should be sorted behind the four letter word. A space is more than nothing so a word with a following space comes after the word with nothing behind it.
Comment 13 manolis 2009-10-31 21:43:39 UTC
I don't know if it is related... but I found another natural stupid way of sorting in dolphin.
I have 4 files: k.cpp ,kgui.cpp kgui.h, k.h and the sorting is done like this:
k.cpp
kgui.cpp
kgui.h
k.h

Apparently the 'g' is ahead of 'h' as the '.' is ignored...
naturally stupid.
Comment 14 Peter Penz 2009-11-01 13:36:11 UTC
@manolis: I agree, this sorting is stupid. But as mentioned in comment #11 this is done inside Qt... I'm thinking of bypassing this issue in kdelibs, but I doubt I can do this until KDE 4.4.
Comment 15 Todd 2009-11-09 18:16:07 UTC
@ Peter Penz: "The input from Markus is a valid request, but has nothing to do with the "natural sorting" as defined by the http://sourcefrog.net/projects/natsort"

According to that link,

a < a0 < a1 < a1a < a1b < a2 < a10 < a20 

That seems to be exactly what this bug report is all about.  That works if files don't have extensions, but if you add extensions you get this:

a0.txt < a1a.txt < a1b.txt < a1.txt < a2.txt < a10.txt < a20.txt < a.txt

The order of sorting of these files is totally different than the order listed in the link you provided.  Now granted the files have extension while the ordering on the website doesn't, but I do think it is a valid point that, ignoring the extension, the sorting should be the same as it is on the website you provided.
Comment 16 Jo Schulze 2010-01-03 15:53:07 UTC
The way dolphing sorts files and directories is IMHO unpredictable. AFAICS dolphin checks if a file/directory name evaluates to a numerical expression (eg. "20100103"), if so, sort by the numeric value. This seems to be a good idea, but is isn't, as the following is the dolphin-sorted order of a directory containing subdirectories:
1999
2007
199910
200908
19991010
20040526
20071024

The IMHO correct sorting (as displayed via ls -1) is:
1999
199910
19991010
20040526
2007
20071024
200908
20091001

Now, for more fun with dolphin sorting let's see how a bunch of MD5 hashes are sorted (a real life example, consider you'll have to find a certain file with a given MD5 hash in a directory containing some hundreds MD5 hashes gives you an impression how useful the current dolphin sorting is).
The numeric sort interferes if parts of the MD5 could be interpreted as a decimal number, resulting in the following sort order:

0E798DEA6AFD3AB5344C9F9B912845D0
04420B8D290F937D88D569FC7C57853A
1A2A355A6655E32600974C8F4036FD74
88BEBEF0CBEFFCDBF89F9BB2A0DE4A13
116D41E214B4A03B4B5AA4D9D469FFD6
951A811DDA02ABAF2AB5922100F04BD5
1315A4A45DF9F94D80C71DDA64875435
8513D6CF82299817CCC59F49EB1C5F41
25143A3ECE08556533C318B75269F1D1

Again, ls -1 displays the correct order:
04420B8D290F937D88D569FC7C57853A
0E798DEA6AFD3AB5344C9F9B912845D0
116D41E214B4A03B4B5AA4D9D469FFD6
1315A4A45DF9F94D80C71DDA64875435
1A2A355A6655E32600974C8F4036FD74
25143A3ECE08556533C318B75269F1D1
8513D6CF82299817CCC59F49EB1C5F41
88BEBEF0CBEFFCDBF89F9BB2A0DE4A13
951A811DDA02ABAF2AB5922100F04BD5
Comment 17 Peter Penz 2010-01-05 10:10:52 UTC
@Jo: This has been fixed in Dolphin for KDE SC 4.4.0. An option is available to have the sorting you'd like (-> exact the same sorting as ls -l). But this is unrelated to the original bugreport, which wants text.txt sorted before text2.txt.
Comment 18 Peter Penz 2010-02-21 18:16:03 UTC
SVN commit 1093884 by ppenz:

Fix sorting issues of bugs 181211 and 201101. Thanks to Todd for the patch!

BUG: 201101
BUG: 181211
CCMAIL: toddrme2178@gmail.com

 M  +11 -2     kstringhandler.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=1093884
Comment 19 Peter Penz 2010-02-21 18:18:50 UTC
SVN commit 1093886 by ppenz:

Backport of SVN commit  1093884 (should be part of KDE SC 4.4.1): Fix sorting issues of bugs 181211 and 201101. Thanks to Todd for the patch!

CCBUG: 201101
CCBUG: 181211
CCMAIL: toddrme2178@gmail.com

 M  +11 -2     kstringhandler.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=1093886