Bug 287690

Summary: KWebkitPart does not apply correct locale encoding settings on some pages with CJK characters.
Product: [Frameworks and Libraries] kwebkitpart Reporter: moriramar
Component: generalAssignee: webkit-devel
Status: RESOLVED UPSTREAM    
Severity: normal CC: adawit
Priority: NOR    
Version: 1.2.0   
Target Milestone: ---   
Platform: Gentoo Packages   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description moriramar 2011-11-27 16:11:53 UTC
Version:           unspecified (using KDE 4.7.2) 
OS:                Linux

When I open some pages with both simplified Chinese characters and traditional Chinese characters, some characters are not displayed correctly. Pages containing both Chinese characters and Japanese characters might cause this problem as well.

Personal guess:
These pages might be encoded in zh_CN.GBK or zh_CN.GB18030 (which contains more character encodings), while KWebkitPart might apply zh_CN.GB2312 (which is generally considered as a subset of GBK.).

Reproducible: Always

Steps to Reproduce:
1. Install a font covering CJK characters. Bitstream Cyberbit, WenQuanYi Zen Hei, WenQuanYi Microhei or Droid is OK.
2. Make sure zh_CN.GBK, zh_CN.GB2312, zh_CN.GB18030, zh_CN.UTF-8 locales are available on the system.
3. Open Konqueror 4.7.2 and enable Webkit mode.
4. Go to http://www.acfun.tv/v/ac265957/ , which might be a little slow.

Actual Results:  
In the top bold title line of the page content, a black box with white question mark appears. In the next line, there are two black boxes seperated by a "W" character, followed by a "o" character.
Trying "View >> Encoding >> Simplified Chinese >>" any GB* locales does not solve the problem.
Opening this kind of pages has a chance to crash Konqueror.

Expected Results:  
No these black boxes and "W" or "o" characters in these two line.
KHTML can show this page well when encoding is set to "Simplified Chinese >> GBK" or "Simplified Chinese >> GB18030", which can be referred to.

Portage 2.1.10.38 (hardened/linux/x86/desktop, gcc-4.5.3, glibc-2.13-r4,
3.0.4-hardened-r5 i686)
=================================================================
System uname:
Linux-3.0.4-hardened-r5-i686-AMD_Athlon-tm-_II_Neo_K345_Dual-Core_Processor-with-gentoo-2.1
Timestamp of tree: Sat, 26 Nov 2011 16:30:01 +0000
app-shells/bash:          4.2_p10
dev-lang/python:          2.7.2-r3, 3.2.2
dev-util/cmake:           2.8.6-r1
dev-util/pkgconfig:       0.26
sys-apps/baselayout:      2.1
sys-apps/openrc:          0.9.4
sys-apps/sandbox:         2.5
sys-devel/autoconf:       2.68
sys-devel/automake:       1.10.3, 1.11.1-r1
sys-devel/binutils:       2.21.1-r1
sys-devel/gcc:            4.5.3-r1
sys-devel/gcc-config:     1.4.1-r1
sys-devel/libtool:        2.4-r4
sys-devel/make:           3.82-r3
sys-kernel/linux-headers: 2.6.39 (virtual/os-headers)
sys-libs/glibc:           2.13-r4
Repositories: gentoo gentoo-zh gentoo-haskell science kde sunrise local
ACCEPT_KEYWORDS="x86 ~x86"
ACCEPT_LICENSE="* -@EULA skype-eula"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-march=i686 -O2 -pipe -fomit-frame-pointer"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/config /usr/share/gnupg/qualified.txt
/usr/share/openvpn/easy-rsa"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf
/etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo
/etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d
/etc/texmf/web2c"
CXXFLAGS="-march=i686 -O2 -pipe -fomit-frame-pointer"
DISTDIR="/var/pkg/dist"
EMERGE_DEFAULT_OPTS="--keep-going y --with-bdeps y"
FEATURES="assume-digests binpkg-logs distlocks ebuild-locks fixlafiles news
parallel-fetch protect-owned sandbox sfperms strict unknown-features-warn
unmerge-logs unmerge-orphans userfetch"
FFLAGS=""
GENTOO_MIRRORS="http://mirrors.163.com/gentoo"
LANG="en_GB.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LINGUAS="zh_TW zh af ak am ar as as_IN ast az be be_BY bg bn bn_BD bn_IN bo br
brx bs ca ca_XV ca@valencia crh cs csb cy da de de_FR dgo dz ee el en en_CA
en_GB en_US en_ZA eo es es_AR es_CL es_CR es_ES es_LA es_MX et et_EE eu fa fi
fil fo fr fr_CA fy fy_NL ga ga_IE gd gl gu gu_IN he hi hi_IN hne hr hsb hu hy
hy_AM ia id is it ja ka kk km kn kn_IN ko ko_KR kok ks ku ky la lb lg lo lt lv
mai me mk ml ml_IN mn mni mr mr_IN ms mt my nb nb_NO nds ne nl nn nn_NO no nr
ns nso oc om or or_IN pa pa_IN pap pl ps pt pt_BR pt_PT rm ro ru rw sa_IN sat
sd se sh sh_YU son si sk sl sq sr sr@ijekavian sr@ijekavianlatin sr@latin
sr@Latn sr_CS ss st sv sv_SE sw sw_TZ ta ta_IN ta_LK te te_IN tg th ti ti_ER tk
tl tn tr ts ug uk ur_IN ur_PK uz uz@cyrillic ve vi wa xh zh_CN zh_HK zu"
MAKEOPTS="-j2"
PKGDIR="/var/pkg/bin"
PORTAGE_COMPRESS=""
PORTAGE_COMPRESS_FLAGS=""
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress
--force --whole-file --delete --stats --timeout=180 --exclude=/distfiles
--exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/var/pkg/portage"
PORTDIR_OVERLAY="/var/pkg/gentoo-zh /var/pkg/haskell /var/pkg/science
/var/pkg/kde /var/pkg/sunrise /var/pkg/usr"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="X a52 aac acl acpi avahi bash-completion berkdb bluetooth branding bzip2
cairo cdda cdr cjk cli consolekit cracklib crypt cups cxx dbus djvu dri dts dvd
dvdr emboss encode exif fam ffmpeg firefox flac fontconfig gdbm gdu gif gpm
gstreamer hardened iconv ipv6 jpeg jpeg2k kde lame lcms ldap libnotify mad mms
mmx mmxext mng modules mp3 mp4 mpeg msn mudflap ncurses nls nptl nptlonly ogg
opengl openmp pam pango pax_kernel pcre pdf pic png policykit ppds pppd
pulseaudio qt3support qt4 readline samba sdl semantic-desktop session spell
sqlite ssl startup-notification svg sysfs syslog taglib tcpd threads tiff
truetype udev unicode upnp urandom usb v4l vaapi vim-syntax vorbis wifi x264
x86 xcb xcomposite xml xorg xulrunner xv xvid xvmc zlib" ALSA_CARDS="ali5451
als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1 emu10k1x ens1370
ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident
usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy
dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear
meter mmap_emul mulaw multi null plug rate route share shm softvol"
APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm
authn_default authn_file authz_dbm authz_default authz_groupfile authz_host
authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir
disk_cache env expires ext_filter file_cache filter headers include info
log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling
status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words
flow plan stage tables krita karbon braindump" CAMERAS="ptp2"
COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog"
DRACUT_MODULES="crypt crypt-gpg syslog" ELIBC="glibc" INPUT_DEVICES="acecad
aiptek elographics evdev fpit hyperpen joystick keyboard mouse mutouch penmount
synaptics wacom" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk
hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="apm ark
ast chips cirrus epson geode glint i128 i740 intel mach64 mga neomagic nouveau
r128 radeon rendition s3 s3virge savage siliconmotion sis sisusb tdfx tga
trident tseng v4l vesa via"
Comment 1 moriramar 2011-11-27 16:12:34 UTC
I am using KWebkitPart 1.2.0 from Gentoo Portage
Comment 2 Dawit Alemayehu 2011-11-30 15:09:00 UTC
This is an upstream QtWebKit issue that can be reproduced with QtTestBrowser. 
Use the instructions at http://trac.webkit.org/wiki/QtWebKitBugs to file the bug report there.
Comment 3 Dawit Alemayehu 2011-12-01 05:20:11 UTC
Upstream ticket https://bugs.webkit.org/show_bug.cgi?id=73519
Comment 4 moriramar 2011-12-01 06:29:51 UTC
Sorry that I forgot to put my upstream report here. I marked https://bugs.webkit.org/show_bug.cgi?id=73447 as a duplicated as yours.

Thanks.