166222 – [PATCH] universal charset encoding detection in katepart kencodingdetector

Bug 166222 - [PATCH] universal charset encoding detection in katepart kencodingdetector

Summary: [PATCH] universal charset encoding detection in katepart kencodingdetector

Status:	RESOLVED FIXED

Alias:	None

Product:	kdelibs
Classification:	Unmaintained
Component:	general (other bugs)
Version First Reported In:	unspecified
Platform:	Compiled Sources Linux

Importance:	NOR wishlist
Target Milestone:	---
Assignee:	kdelibs bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2008-07-10 12:39 UTC by Wang Hoi
Modified:	2008-12-29 07:50 UTC (History)
CC List:	0 users

See Also:
Latest Commit:
Version Fixed/Implemented In:
Sentry Crash Report:

Attachments
Add an attachment

Note You need to log in before you can comment on or make changes to this bug.

Description Wang Hoi 2008-07-10 12:39:53 UTC

Version:           4.0.85 (using Devel)
Installed from:    Compiled sources
Compiler:          gcc 4.1.2 --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,obj-c++,java,fortran --disable-libgcj --with-cpu=generic --host=i686-pc-linux
OS:                Linux

Hi, i have port firefox's charset detection and add some patches to kde4.0.85, to make universal charset autodetection works in kwrite(kate) and konqueror(all apps use KEncodingDetector).

ftp://orafy:public@public.sjtu.edu.cn/mozilla-chardet-0.1.tar.bz2
ftp://orafy:public@public.sjtu.edu.cn/kdelibs-4.0.85-cmake.patch
ftp://orafy:public@public.sjtu.edu.cn/kdelibs-4.0.85-katedialogs.patch
ftp://orafy:public@public.sjtu.edu.cn/kdelibs-4.0.85-kcodecaction.patch
ftp://orafy:public@public.sjtu.edu.cn/kdelibs-4.0.85-kencoding.patch

Screenshot: kwrite's config dialog which shows the new added "Universal" option is
charset detection combox.
http://img61.imageshack.us/my.php?image=80961170fw7.png

untar mozilla-chardet-0.1.tar.bz2 and cmake && make && make install
mozilla-chardet depends on nothing so it's also easy to include it in the source branch.

I'm a chinese kde user,
 after I apply these patches, i have tested it by using kwrite to open big5/gb18030/enc-jp encoded documents, correctness is nearly 100%.
The encoding detection algorithm is very complex compared to kdecore/localization/*
A paper describe their methods:
 http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html
Mozilla's related sourcecode:
http://lxr.mozilla.org/seamonkey/source/extensions/universalchardet/src/

Comment 1 Wang Hoi 2008-12-29 07:50:48 UTC

Fixed.
It's included as KEncodingProber in KDE 4.2