Bug 143573 - Incorrect MIME type detection for UTF-8 text files with very long lines
Summary: Incorrect MIME type detection for UTF-8 text files with very long lines
Status: RESOLVED WAITINGFORINFO
Alias: None
Product: kdelibs
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: unspecified
Platform: Gentoo Packages Linux
: NOR normal
Target Milestone: ---
Assignee: David Faure
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-03-28 22:36 UTC by deleted_email_KsJQa
Modified: 2010-02-26 17:33 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description deleted_email_KsJQa 2007-03-28 22:36:13 UTC
Version:            (using KDE KDE 3.5.5)
Installed from:    Gentoo Packages
OS:                Linux

UTF-8 text files labelled as containing "very long lines" (more than 300 bytes, in KDE -`file`, however, counts any Unicode character as one character, but KDE do not inform the user that the file contains "very long lines", so it does not really matter), in KDE (in Konqueror or KMail, for example), are typed as "Unknown". It means problems with file associations, no preview, possible problems with antivirus/antispam filters (for emails -these files are sent with the "application/octet-stream" MIME type (without a charset, by the way, though it does not matter much, as the files cannot be previewed from the email client)), and general confusion which results from these problems.

There is no problem with ASCII files, even if they contain "very long lines" (they are properly typed as "Plain Text Document").

UTF-8 text files, without "very long lines", are properly typed too.


If you want to test, simply create a file, with a single line, containing less than 300 bytes, and some UTF-8 non-ASCII characters (like "é"), and another file, with a line containing more than 300 bytes (and some UTF-8 non-ASCII characters). Then, just check the type in Konqueror or KMail.


KDE should properly detect the MIME type of UTF-8 text files, containing "very long lines" (that is, "text/plain").
Comment 1 Perry WHITE 2007-08-20 17:54:05 UTC
This occurs both with Konqueror 3.4.0 using kde 3.4.0, Suse 9.3 and 
konq 3.5.6 Kubuntu 7.0.4

Perry
Comment 2 Jaime Torres 2008-05-26 18:33:39 UTC
Reproduced in kde 4.0.3 (f9).
The file program reports UTF-8 Unicode text, with very long lines.
Comment 3 Rui G. 2008-08-22 17:09:00 UTC
The problem is still there in Opensuse 11.0 kde 3.5.9 and 4.1 .
Comment 4 David Faure 2010-02-26 17:33:27 UTC
Please provide test files (and reopen the bug). My own tests work.

$ kmimetypefinder testfile
text/plain
$ kmimetypefinder testfile_long
text/plain

$ file testfile*
testfile:      UTF-8 Unicode text
testfile_long: UTF-8 Unicode text, with very long lines