Bug 166222 - [PATCH] universal charset encoding detection in katepart kencodingdetector
Summary: [PATCH] universal charset encoding detection in katepart kencodingdetector
Status: RESOLVED FIXED
Alias: None
Product: kdelibs
Classification: Frameworks and Libraries
Component: general (show other bugs)
Version: unspecified
Platform: Compiled Sources Linux
: NOR wishlist
Target Milestone: ---
Assignee: kdelibs bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-07-10 12:39 UTC by Wang Hoi
Modified: 2008-12-29 07:50 UTC (History)
0 users

See Also:
Latest Commit:
Version Fixed In:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Wang Hoi 2008-07-10 12:39:53 UTC
Version:           4.0.85 (using Devel)
Installed from:    Compiled sources
Compiler:          gcc 4.1.2 --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,obj-c++,java,fortran --disable-libgcj --with-cpu=generic --host=i686-pc-linux
OS:                Linux

Hi, i have port firefox's charset detection and add some patches to kde4.0.85, to make universal charset autodetection works in kwrite(kate) and konqueror(all apps use KEncodingDetector).

ftp://orafy:public@public.sjtu.edu.cn/mozilla-chardet-0.1.tar.bz2
ftp://orafy:public@public.sjtu.edu.cn/kdelibs-4.0.85-cmake.patch
ftp://orafy:public@public.sjtu.edu.cn/kdelibs-4.0.85-katedialogs.patch
ftp://orafy:public@public.sjtu.edu.cn/kdelibs-4.0.85-kcodecaction.patch
ftp://orafy:public@public.sjtu.edu.cn/kdelibs-4.0.85-kencoding.patch

Screenshot: kwrite's config dialog which shows the new added "Universal" option is
charset detection combox.
http://img61.imageshack.us/my.php?image=80961170fw7.png

untar mozilla-chardet-0.1.tar.bz2 and cmake && make && make install
mozilla-chardet depends on nothing so it's also easy to include it in the source branch.

I'm a chinese kde user,
 after I apply these patches, i have tested it by using kwrite to open big5/gb18030/enc-jp encoded documents, correctness is nearly 100%.
The encoding detection algorithm is very complex compared to kdecore/localization/*
A paper describe their methods:
 http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html
Mozilla's related sourcecode:
http://lxr.mozilla.org/seamonkey/source/extensions/universalchardet/src/
Comment 1 Wang Hoi 2008-12-29 07:50:48 UTC
Fixed.
It's included as KEncodingProber in KDE 4.2