Bug 198477 - KMimeType::findByUrl() misdetects a C++ header named "Core" as corefile
Summary: KMimeType::findByUrl() misdetects a C++ header named "Core" as corefile
Status: RESOLVED FIXED
Alias: None
Product: kdelibs
Classification: Unmaintained
Component: kdecore (show other bugs)
Version: SVN
Platform: Compiled Sources Linux
: NOR normal
Target Milestone: ---
Assignee: kdelibs bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-07-01 01:31 UTC by Benoît Jacob
Modified: 2009-08-20 23:09 UTC (History)
1 user (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Benoît Jacob 2009-07-01 01:31:51 UTC
Version:           3.9.94 (using Devel)
OS:                Linux
Installed from:    Compiled sources

In my project I have global header files without extension:

http://bitbucket.org/eigen/eigen2/src/tip/Eigen/

All of these files should be detected as C++ headers. Result:

* The file named Core is can't be opened at all in KDevelop. Opening it does nothing. It isn't hard to detect as C++, as it contains "#include" and "namespace"...

* Only Eigen and Dense are detected in a correct way (as C source files, which is OK). These files consist of just #includes.

* Most other files are detected as plain text files. Strange, as they aren't hard to detect as C++, for same reason as above.
Comment 1 Andreas Pakulat 2009-07-01 10:39:56 UTC
Can you explain where they are mis-detected? I'm assuming the documents list on the left?
Comment 2 Benoît Jacob 2009-07-05 00:52:06 UTC
This is happening:
- on the "filesystem" vertical tab on the left
- in the File->Open... dialog

I'm not talking about the "documents" vertical tab because since it doesn't show icons, i can't speak of good or bad "file type detection about it". Also, there's no way that I can make the file Core appear in this "documents" tab, because I can't open it in the first place.
Comment 3 Benoît Jacob 2009-07-05 00:54:50 UTC
Oh yes and I should clarify that when I speak of bad file detection I mean three separate bad effects:
1) Core can't be opened at all
2) Files shown with wrong icon in the filesystem tab and in file/open dialog
3) Once opened, the files are not correctly syntax-highlighted (in my case, since they're detected as "plain text" they're not highlighted at all).
Comment 4 Andreas Pakulat 2009-07-05 08:51:36 UTC
Ok, point 2) is purely a kdelibs issue as the filedialog and the filesystem browser are implemented there.

point 3) is a kate issue which simply doesn't support C++ header files without extensions I guess (either that or its again kdelibs which doesn't properly detect the mimetype).

Please file separate reports for these two against kate and kdelibs.

So that leaves point 1) which could as well be a kdelibs issue (I'm thinking that maybe KMimeType tells use "Core" is a gdb core file, which kdevelop cannot open). I'll try to reproduce.
Comment 5 Andreas Pakulat 2009-07-05 22:54:51 UTC
Well, it turns out that even 3) is a fault of kdelibs. KDevelop gets "application/x-core" as mimetype and thats whats throwing things off. Re-Assigning to kdelibs as corefiles are usually not named with a capital C and the mimetype detection should at least check the first few bytes for some readable text.
Comment 6 Christoph Feck 2009-07-06 16:32:39 UTC
Unfortunately, this is a bug in shared-mime-info. The "application/x-core" type is detected with <glob pattern="core"/> and according to specs, case insensitive comparison is required.

See http://standards.freedesktop.org/shared-mime-info-spec/latest/ar01s02.html#id2571168
Comment 7 Andreas Pakulat 2009-07-06 17:47:53 UTC
(In reply to comment #6)
> Unfortunately, this is a bug in shared-mime-info. The "application/x-core" type
> is detected with <glob pattern="core"/> and according to specs, case
> insensitive comparison is required.

Well, we could try to do some fallbacks inside kdelibs. As far as I understood we do get to find out the weight that was used for the matching and hence we could do a content-based check if the weight is below 80 or something like that.

I can fix the "doesn't open in kdevelop" problem within kdevelop, but I cannot fix kate applying the wrong syntax highlighting. Or any other problems arising from the findByUrl() mimetype being wrong.

Anyway, I've filed an upstream bug (but I'll leave this open to discuss possible workarounds inside kdelibs):
https://bugs.freedesktop.org/show_bug.cgi?id=22634
Comment 8 David Faure 2009-08-18 15:40:42 UTC
I disagree with changing the mimetype detection algorithm to differ from the one in the spec (which I helped write, and which I completely agree with). Known extensions _do_ have priority over content, otherwise listing a directory in a file manager would take forever. All we need is case-sensitive matching for the "core" pattern, comment added in fdo bug 22634 and on the xdg list.
Comment 9 Benoît Jacob 2009-08-18 15:55:30 UTC
Indeed matching only lowercase "core" would solve the problem in my case, but it's still annoying that basically a project having a header named "core" can't be edited in KDevelop.

The argument,

> Known extensions _do_ have priority over content, otherwise listing
> a directory in a file manager would take forever

is basically an argument against sharing a one-fits-all file type detection mechanism. Perhaps what's needed is for it to be configurable: a file manager would say, "make it fast please, so prioritize the filename matching" while KDevelop->"open file..." would say "take your time, prioritize analyzing the content"
Comment 10 David Faure 2009-08-18 19:30:37 UTC
I disagree, this would lead to inconsistent results depending on "who you ask for the mimetype". You'd click on a file in the file manager, and the associated application, doing a different kind of mimetype checking, would say "sorry I don't support it". Bad. 
On top of this: the idea of giving extensions priority is that the user is in control; content-based detection is never 100% correct, so the user can rename the file to ".txt" to force e.g. a file to be recognized as text.

If there is a case where the extension should not be trusted, (like *.doc) then we remove the <glob> line, so that content is looked at. Indeed "core" might be such a case, since it's not really an extension (possibly chosen by the user) but rather a filename. But do we really have a use case for a file named core?
Comment 11 David Faure 2009-08-20 23:09:27 UTC
SVN commit 1013836 by dfaure:

Mimetype determination: Fix glob matching according to discussion on xdg list, so that *.C, *.c and core are strictly case-sensitive-only.

This fixes the case of "foo.PS.gz", which the previous algorithm would treat as "simply gzip",
and it makes the matching of mixed-case extensions like "textfile.TxT" 3 times faster (QBENCHMARK+callgrind rocks).

This commit does -not- require the next release of shared-mime-info; we emulate what it will
output i.e. force those three hardcoded globs to case-sensitive if shared-mime-info
is <= 0.60 (which is the case for everyone right now), to avoid regressions.

BUG: 198477


 M  +16 -22    kdecore/services/kmimetypefactory.cpp  
 M  +8 -3      kdecore/services/kmimetypefactory.h  
 M  +1 -1      kdecore/sycoca/ksycoca.cpp  
 M  +5 -1      kdecore/tests/kmimetypetest.cpp  
 M  +10 -5     kded/kbuildmimetypefactory.cpp  
 M  +37 -10    kded/kmimefileparser.cpp  
 M  +8 -6      kded/kmimefileparser.h  
 M  +34 -15    kded/test/kmimefileparsertest.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=1013836