Bug 179399 - listing of files in "Name" order seems to be in incorrect order
Summary: listing of files in "Name" order seems to be in incorrect order
Status: RESOLVED DUPLICATE of bug 169883
Alias: None
Product: dolphin
Classification: Applications
Component: general (show other bugs)
Version: 16.12.2
Platform: Ubuntu Linux
: NOR wishlist
Target Milestone: ---
Assignee: Peter Penz
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-01-02 18:33 UTC by doc.evans
Modified: 2009-11-25 18:38 UTC (History)
4 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Screenshot showing the problem (in the left panel). (382.75 KB, image/png)
2009-01-02 18:35 UTC, doc.evans
Details

Note You need to log in before you can comment on or make changes to this bug.
Description doc.evans 2009-01-02 18:33:32 UTC
Version:            (using KDE 4.1.3)
OS:                Linux
Installed from:    Ubuntu Packages

There are several old bug reports (really, really old) on sorting issues and marked as RESOLVED (apparently resolved long ago).

However, I will attach a screenshot from 4.1.3 showing mis-sorting. So it looks like a regression bug might have been introduced in KDE4.

In particular, note the presence of:
1. several files starting with "2008" after "495";
2. the "2008" files are followed by "493" and others that should have been before the "495" ones;
3. A "35" after a "493" (which itself is long after "495" and "494" files.

There must be some sense to the order presented, but I sure can't see what algorithm Konqueror is using: it's certainly far from the alphanumeric order one expects.
Comment 1 doc.evans 2009-01-02 18:35:37 UTC
Created attachment 29833 [details]
Screenshot showing the problem (in the left panel).

Screenshot showing the problem (in the left panel).
Comment 2 FiNeX 2009-01-03 20:45:18 UTC
Look to this two filenames:

a) 4958f22044afc
b) 49370dc85e308

split the strings to digits and chars:
a) 4958 f 22044 afc
b) 49370 dc 85 e 308

the first substring of [a] is 4958 which is less than the first substring of [b] (49370). The "natural order" has been respected.

This is the expected behaviour of the natural sorting.

@Peter: do you agree that this is the correct behaviour?

Comment 3 Peter Penz 2009-01-03 22:06:04 UTC
@FiNeX: yes, this is the correct sort order for "natural sorting". i currently don't intend to introduce an option to turn off the natural sorting, but we could classify this report as wish so that users get the chance to vote for this.
Comment 4 doc.evans 2009-01-03 22:40:37 UTC
I don't know what "natural sorting" is.

I do know that I stared at this sorting for ages and couldn't figure out what algorithm was being used (obviously, *some* algorithm was being used, and hence, if one regarded that algorithm as correct then by definition the order is correct).

Ultimately, all I know is that this sorting is (frankly) nuts. A filename is a set of characters. It's a file NAME. The sorting should be done according to what seems like the obvious principle for names: define an ordering for the character set, then apply that ordering character-by-character.

When I do "ls -al" I don't have to think for a microsecond about whether a file is present or not: I just look down the list at the right place. I should absolutely be able to do the same thing with a listing in konqueror.

So if this has to be voted on, then I heartily endorse whatever mechanism is used to allow people to vote. I find it weird that this is even an issue. How someone could ever think that 
"4958f" is somehow a predecessor to "49370" is amazing to me. I can't think of a single programming language that would give that result.





Comment 5 FiNeX 2009-01-03 23:39:01 UTC
Please, read bug #179399 comment #2
Comment 6 doc.evans 2009-01-04 20:22:57 UTC
Yes, I had read comment 2. I just re-read it at your request, but I don't understand why you wanted me to re-read it.

The comment explains the algorithm that Konqueror is using, but it doesn't explain why it makes any kind of sense to use that algorithm. (Well, actually, it defines a part of the algorithm that Konqueror uses; it takes a lot of experimentation to discover the complete algorithm; see below.)

I have given a lot of thought to this over the past 24 hours (lost quite a bit of sleep over it, actually). I was trying very hard to see if I could understand a reasonable justification for Konqueror's behaviour, because I have to assume that smart people have decided that it makes sense to do things this way -- and I simply, after much thought, have failed to do so.

I think that the basic reason for this is as follows.

Starting by defining some terms:

Let α, β and γ be arbitrary strings.
Let + denote concatenation.
Let SEQ(α, β) be the ordered sequence (according to some algorithm) for the strings α, β.
Let SEQ0(α, β) be the function that orders α and β in normal alphanumeric order as implemented in (say) the C++ standard library. (Which I believe matches common experience as used throughout the English-speaking world, with one minor emendation.)
Let SEQK(α, β) be the function that orders α and β as cirrently implemented in Konqueror.
Let {x, y} be the ordered set containing x, y (i.e., in that order).

So with these definitions, ordinary experience corresponds to:
SEQ0(γ + α, γ + β) ≡ SEQ0(α, β)                      [ Lemma 1 ]
That is, a simple rule, immediately applicable so that a person can instantly decide whether two strings are ordered correctly. (For simplicity consider only the cases where the lengths of α and β are the same; another rule needs to be added for the case where the lengths are different, but that simply complicates the issue here.)

Now let's see what the rule is as implemented in Konqueror.

Well, the first problem is that there is no single rule. In fact, there are several rules. And for each of many different circumstances, the developers have had to make a choice how the particular rule should work -- and there is no way a user can know in advance the decision that the developers made. So occasions in which these rules are invoked lead to what appears (until the user has actually experimented) to unpredicatable results.

Let's look at some examples.

In the following, let α = "9999f" and β = "11111".

Now, according to comment 2, we should have:
SEQK(α, β) = {α, β}
and that is indeed what we find.

Now, let γ = "+". We should have:
SEQK(γ + α, γ + β) = {γ + α, γ + β}
and yes, that is consistent.

Now let γ = "-". Hmmm.... what should it do now? There's no way to know. According to comment 2, 
SEQK(γ + α, γ + β) = {γ + β, γ + α}
in order to reflect the "natural order" of the numbers { -11111, -9999 }.

But performing the experiment, we see that actually:
SEQK( "-9999f", "-11111") = { "-9999f", "-11111" }
So apparently the developers made a decision here that the user could not have guessed beforehand. (Not that the developer's decision is unreasonable: the point here is that either ordering is justifiable according to the rationale given earlier, and hence the user has to guess what Konqueror in fact does, and is as likely to get the answer wrong as right.)

What about a more complex case? Let γ = "123." Now, logically:
SEQK(γ + α, γ + β) = {γ + β, γ + α}
But that's not what Konqueror does. Again, the user could not have guessed that beforehand. It's not an unreasonable thing to do, but it's not the "natural order".

(But the strings in the last examples represent numbers only in some locales. So, to be perfectly consistent, the order in the last example should be locale-dependent.)

What about the case where γ = "0x1"? Again, the user can't guess in advance what Konqueror will do. (I was actually tempted to write there that a user will normally guess a particular result, but I can't pretend to know what most users would guess. For me, it seemed by far most likely that the order would be { "0x111111", "0x19999f" } -- but it's not. In fact,
SEQK( "0x19999f", "0x111111") = { "0x19999f", "0x111111" }

One final example that caught me by surprise given the argument that has been made about "natural ordering":
SEQK ("aa9999f", "aa11111") = { "aa9999f", "aa11111" }
From which one concludes that one should look at the first "number" represented in the filename; but in fact
SEQK ("ab9999f", "aa11111") = { "aa11111", "ab9999f" }
So the actual algorithm used by Konqueror is more complicated, and at that point it's simply too difficult for my poor brain to try to figure out how to write the behaviour as a lemma.

So it seems that what Konqueror actually implements is it's own idea of "natural order" is essentially impossible to describe in a concise rule. I've tried, and given up. Maybe you guys can do it for me. Contrast this situation with Lemma 1, above, with which every English speaker is familiar.

----

The fundamental problem here is that names are not numbers (bizarrely, I was an expert in a $400m lawsuit last year which turned out to revolve in part around this very issue; the world is funny sometimes). Anyway, I agree that filenames can sometimes represent orderable numbers, and on the occasions when they do so, it makes sense to allow the system to treat them as such (the "-n" option to "sort" is a similar case where this happens). But in general, the strings "0" ... "9" are simply alphanumeric characters embedded in a longer string. Can you name me one computer language (or library, or anything else with which users might be familiar) in which sorting occurs in the manner implemented in Konqueror, as opposed to that represented by Lemma 1?

Please, please, please if this doesn't convince you (and I have no hopes that it will) somehow put this up for a vote.
Comment 7 Peter Penz 2009-01-04 20:54:55 UTC
@Doc Evans: natural sorting just means that the items

"Image 1.jpg", "Image 2.jpg", "Image 10.jpg" are sorted like this:

Image 1.jpg
Image 2.jpg
Image 10.jpg

and not like this (like done in KDE 3):

Image 10.jpg
Image 1.jpg
Image 2.jpg

This kind of sorting is expected by users, as 1 is smaller than 10. This is also why file managers like Mac OS X Finder or Windows Explorer since XP use this way of sorting.

However I understand that in your usecase when having hexadecimal numbers as filenames this is not the expected behavior. But this usecase does not occur for the target user group of Dolphin (see http://dolphin.kde.org/philosophy.html) and hence I will not add an option for configuring this in Dolphin. I'm open to provide an internal option, so that at least this can be configured in Konqueror...

I've changed the severity to "wishlist" and have confirmed the issue.
Comment 8 doc.evans 2009-01-05 00:07:16 UTC
Just for the record, this has nothing to do with hexadecimal numbers; it has to do with simple ordering of strings as opposed to numbers.

I knew I wouldn't be able to convince you that, per the "ls" command,  "2008-12-22-Note-12-56.xoj.bg.pdf" should precede "493d90e20f59d" (or, come to that, that "2008 photographs" should precede "95th street interchange picture"), but I honestly can't believe that an ordinary person would believe otherwise.



Comment 9 Peter Penz 2009-01-05 08:50:43 UTC
> Just for the record, this has nothing to do with hexadecimal
> numbers; it has to do with simple ordering of strings
> as opposed to numbers. 

*sigh* It is not really efficient for a discussion if you're not at least trying to understand what FiNeX and I try to explain...

> but I honestly can't believe that an ordinary person would believe otherwise.

So if you think an "ordinary person" wants items sorted like this (and this is exactly what happens with your approach):

Image 10.jpg
Image 11.jpg
Image 12.jpg
Image 1.jpg
Image 2.jpg
Image 3.jpg
Image 4.jpg
Image 5.jpg
Image 6.jpg
Image 7.jpg
Image 8.jpg
Image 9.jpg

instead of

Image 1.jpg
Image 2.jpg
Image 3.jpg
Image 4.jpg
Image 5.jpg
Image 6.jpg
Image 7.jpg
Image 8.jpg
Image 9.jpg
Image 10.jpg
Image 11.jpg
Image 12.jpg

then I really fail to understand you... Regarding your example "95th street interchange picture" and "2008 photographs": Just ask some people what should be sorted first and you'll be surprised that even when asking developers you'll get a 50:50 answer.
Comment 10 doc.evans 2009-01-05 16:43:36 UTC
(In reply to comment #9)
> > Just for the record, this has nothing to do with hexadecimal
> > numbers; it has to do with simple ordering of strings
> > as opposed to numbers. 
> 
> *sigh* It is not really efficient for a discussion if you're not at least
> trying to understand what FiNeX and I try to explain...
> 

How on Earth did you conclude that I am not trying to understand?????? I spent literally hours trying to understand what you mean by "natural order", and failing to come up with the algorithm that you are using. Whatever it is, it's not obvious, and there are lots of conditions under which I for one cannot guess what order Konqueror is going to put filenames. Maybe you could clarify the whole thing for me: where can I find a *precise* definition of this "natural order" concept? (In other words, something that is equivalent to Lemma 1; I don't think I care about the case where the lengths of the names are different: that should be an obvious extension.) Then at least I would be able to understand the logic behind what Konqueror does.

> 
> then I really fail to understand you... Regarding your example "95th street
> interchange picture" and "2008 photographs": Just ask some people what should
> be sorted first and you'll be surprised that even when asking developers you'll
> get a 50:50 answer.

Which is the whole point: you can't ascribe semantic information to filenames, because if you do, you're going to get disagreement. You've gone from an absolute condition ("there is no semantic information in this string") to a relative one ("we're going to take some parts of the string and assume that they mean X"; which has the corollary "and we're therefore going to not ascribe this meaning to other parts of the string"), in which reasonable people will disagree ("which parts of the string do I assume mean X"), and one cannot guess what another person will do. So I am asking you to take the guesswork out for me by telling me where the definition is for this "natural order" semanticism (yeah, I had to look up that word :-) ).




Comment 11 Peter Penz 2009-01-05 18:36:24 UTC
Please check http://sourcefrog.net/projects/natsort for details about the used algorithm.
Comment 12 Angel Blue01 2009-03-11 17:29:14 UTC
FWIW I prefer "natural sort order" and I can't stand that BASH/UNIX sorts:

Image 10.jpg
Image 11.jpg
Image 12.jpg
Image 1.jpg
Image 2.jpg
Image 3.jpg
Image 4.jpg
Image 5.jpg
Image 6.jpg
Image 7.jpg
Image 8.jpg
Image 9.jpg

That was one of the few nice new things in Windows XP ;-)
Comment 13 doc.evans 2009-03-11 18:46:21 UTC
But is that any reason why the rest of us *have* to put up with something that puts files in the order:

0
0x4f4f
0x444f
1
1,000
3
3.14
95th street interchange
493d90e20f59d
2008-12-22-Note-12-56.xoj.bg.pdf
2008 photographs
-1,000
-3
-3.14
-10
+3

Maybe you like that, and that's fine. I have no problem with other people preferring something other than "ls" ordering. Indeed, I would argue that you *should* be allowed to see things in the order you prefer.

But forcing the above order on those of us who have spent nearly 30 years expecting tools to apply strict alphabetic POSIX ordering seems to me uncomfortably like imposing one's will on a population because those in power know what's "good" for that population.

Incidentally, I'm not arguing that "ls" is perfect [for some definition of the word "perfect"]. But it's at least familiar and predictable. I defy anyone to sort the above list in the same order as Konqueror without thinking very hard and knowing how the innards work.

Just FYI:
[HN:sort] ls -al ["." and ".." removed]
-rw-r--r--  1 n7dr n7dr     0 Mar 11 11:10 +3
-rw-r--r--  1 n7dr n7dr     2 Mar 11 11:09 -1,000
-rw-r--r--  1 n7dr n7dr     2 Mar 11 11:08 -10
-rw-r--r--  1 n7dr n7dr     2 Mar 11 11:09 -100
-rw-r--r--  1 n7dr n7dr     2 Mar 11 11:09 -3
-rw-r--r--  1 n7dr n7dr     2 Mar 11 11:09 -3.14
-rw-r--r--  1 n7dr n7dr     0 Mar 11 11:09 0
-rw-r--r--  1 n7dr n7dr     0 Mar 11 11:11 0x444f
-rw-r--r--  1 n7dr n7dr     0 Mar 11 11:11 0x4f4f
-rw-r--r--  1 n7dr n7dr     0 Mar 11 11:09 1
-rw-r--r--  1 n7dr n7dr     0 Mar 11 11:10 1,000
-rw-r--r--  1 n7dr n7dr     2 Mar 11 11:37 2008 photographs
-rw-r--r--  1 n7dr n7dr     2 Mar 11 11:37 2008-12-22-Note-12-56.xoj.bg.pdf
-rw-r--r--  1 n7dr n7dr     0 Mar 11 11:10 3
-rw-r--r--  1 n7dr n7dr     0 Mar 11 11:10 3.14
-rw-r--r--  1 n7dr n7dr     2 Mar 11 11:37 493d90e20f59d
-rw-r--r--  1 n7dr n7dr     2 Mar 11 11:37 95th street interchange
[HN:sort]

Maybe irrationally, this issue is one of the few things about KDE that makes my blood pressure soar several times per day.

A "Sort in POSIX order" checkbox really, honestly, truly doesn't seem like an unreasonable request.

(Actually, I'm a little surprised that some governments don't insist on it, but maybe KDE hasn't reached the penetration yet where it matters.)
Comment 14 Frank Reininghaus 2009-07-05 21:32:56 UTC
There's another wish report about disabling the "Natural Compare" sorting, I'll mark it as a duplicate.

*** This bug has been marked as a duplicate of bug 169883 ***
Comment 15 spir 2009-11-25 17:36:00 UTC
(In reply to comment #7)
> @Doc Evans: natural sorting just means that the items
> 
> "Image 1.jpg", "Image 2.jpg", "Image 10.jpg" are sorted like this:
> 
> Image 1.jpg
> Image 2.jpg
> Image 10.jpg
> 
> and not like this (like done in KDE 3):
> 
> Image 10.jpg
> Image 1.jpg
> Image 2.jpg
> 
> This kind of sorting is expected by users, as 1 is smaller than 10. This is
> also why file managers like Mac OS X Finder or Windows Explorer since XP use
> this way of sorting.
> 
> However I understand that in your usecase when having hexadecimal numbers as
> filenames this is not the expected behavior. But this usecase does not occur
> for the target user group of Dolphin (see
> http://dolphin.kde.org/philosophy.html) and hence I will not add an option for
> configuring this in Dolphin. I'm open to provide an internal option, so that at
> least this can be configured in Konqueror...
> 
> I've changed the severity to "wishlist" and have confirmed the issue.

I support the request by "doc.evans". And don't understand why you refuse to "add an option for configuring this in Dolphin". Dolphin is now the standard file manager. Konqueror is thus less concerned by the issue.
Just add a checkbow for "standard" filename sorting, as opposed to "natural".

Denis
Comment 16 Frank Reininghaus 2009-11-25 18:38:23 UTC
(In reply to comment #15)
> I support the request by "doc.evans". And don't understand why you refuse to
> "add an option for configuring this in Dolphin".

If you look at bug 169883, which the present report is a duplicate of, you'll see that Dolphin will have this option in KDE 4.4 :-)