Bug 222195 - Encoding autodetection broken after KDE 4.4 beta
Summary: Encoding autodetection broken after KDE 4.4 beta
Status: RESOLVED FIXED
Alias: None
Product: kate
Classification: Applications
Component: encoding (show other bugs)
Version: unspecified
Platform: unspecified Linux
: NOR normal
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords:
: 222180 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-01-11 10:06 UTC by Murz
Modified: 2010-03-15 08:27 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Test file with text "Тестовый текст" in cp1251 encoding. (14 bytes, text/plain)
2010-01-11 10:07 UTC, Murz
Details
test_cp1251.txt (34 bytes, text/plain)
2010-03-11 13:19 UTC, Murz
Details
test_koi8-r.txt (34 bytes, text/plain)
2010-03-11 13:20 UTC, Murz
Details
test_utf8.txt (56 bytes, text/plain)
2010-03-11 13:20 UTC, Murz
Details
bani_text_utf-8.txt (6.81 KB, text/plain)
2010-03-15 08:20 UTC, Murz
Details
cooler_utf-8.html (5.49 KB, text/html)
2010-03-15 08:20 UTC, Murz
Details
joomla_cp1251.php (244.31 KB, application/x-httpd-php)
2010-03-15 08:21 UTC, Murz
Details
joomla_frontend_cp1251.php (13.58 KB, application/x-httpd-php)
2010-03-15 08:21 UTC, Murz
Details
joomla_template_cp1251.php (3.43 KB, application/x-httpd-php)
2010-03-15 08:22 UTC, Murz
Details
kde.ru_index_utf-8.html (20.00 KB, text/html)
2010-03-15 08:22 UTC, Murz
Details
lug_ivanovo_koi8-r.html (10.40 KB, text/html)
2010-03-15 08:23 UTC, Murz
Details
page.tpl_utf-8.php (3.67 KB, application/x-httpd-php)
2010-03-15 08:24 UTC, Murz
Details
qs_index_utf-8.html (11.23 KB, text/html)
2010-03-15 08:25 UTC, Murz
Details
ruskde_koi8-r.htm (13.60 KB, text/html)
2010-03-15 08:25 UTC, Murz
Details
sensi_koi8-r.html (5.01 KB, text/html)
2010-03-15 08:25 UTC, Murz
Details
ubuntuclub.ru_cp1251.html (20.95 KB, text/html)
2010-03-15 08:26 UTC, Murz
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Murz 2010-01-11 10:06:56 UTC
Version:           3.3.90 (using 4.3.90 (KDE 4.3.90 (KDE 4.4 RC1)), Kubuntu packages)
Compiler:          cc
OS:                Linux (x86_64) release 2.6.31-17-generic

After upgrade from KDE 4.3.2 to KDE 4.4 beta1 I have found broken functionality on Encoding autodetection in kate.
At 4.3 it successfully detect correct encoding in file, but now it set encoding always to default.

How to reproduce:
1. Open kate
2. Go to options, set "Encoding" to "Unicode ( UTF-8 )", "Encoding autodetection" to "Cyrillic", press OK.
3. Open file "test.txt" with text "Тестовый текст" in CP1251 encoding.
The file will opened in UTF-8 encoding on KDE 4.4 and in CP-1251 in KDE 4.3!
Comment 1 Murz 2010-01-11 10:07:51 UTC
Created attachment 39762 [details]
Test file with text "Тестовый текст" in cp1251 encoding.
Comment 2 Murz 2010-01-11 10:12:12 UTC
In kate application output when opening this file I see:

kate(4256)/Kate (Document) KateBuffer::doHighlight: HIGHLIGHTED END --- NEED HL, LINESTART:  0  LINEEND:  0                                                                                     
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL UNTIL LINE:  0  MAX:  0                                                                                                                  
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL DYN COUNT:  0  MAX:  512                                                                                                                 
kate(4256)/Kate (Document) KateView::updateView: KateView::updateView                                                                                                                           
kate(4256)/Kate (Document) KateBuffer::doHighlight: HIGHLIGHTED END --- NEED HL, LINESTART:  0  LINEEND:  0                                                                                     
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL UNTIL LINE:  0  MAX:  0                                                                                                                  
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL DYN COUNT:  0  MAX:  512                                                                                                                 
kate(4256)/Kate (Document) KateView::updateView: KateView::updateView                                                                                                                           
kate(4256)/Kate (Document) KateBuffer::doHighlight: HIGHLIGHTED END --- NEED HL, LINESTART:  0  LINEEND:  0                                                                                     
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL UNTIL LINE:  0  MAX:  0                                                                                                                  
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL DYN COUNT:  0  MAX:  512                                                                                                                 
kate(4256)/Kate (App) KateDocManager::slotDocumentNameChanged: docname changed:  "Untitled" -----> "Untitled"                                                                                   
kate(4256)/Kate (Document) KateFileLoader::open: PROBER TYPE:  "Cyrillic"                                                                                                                       
kate(4256)/Kate (Document) KateFileLoader::open: OPEN USES ENCODING:  "windows-1251"
kate(4256)/Kate (Document) KateBuffer::doHighlight: HIGHLIGHTED END --- NEED HL, LINESTART:  0  LINEEND:  0
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL UNTIL LINE:  0  MAX:  0
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL DYN COUNT:  0  MAX:  512
kate(4256)/Kate (Document) KateView::updateView: KateView::updateView
kate(4256)/Kate (Document) KateBuffer::openFile: Broken UTF-8:  false
kate(4256)/Kate (Document) KateBuffer::openFile: LOADING DONE  1
kate(4256)/Kate (Document) KateModeManager::fileType:
kate(4256)/kdecore (services) KMimeTypeFactory::parseMagic: Now parsing  "/usr/local/share/mime/magic"
kate(4256)/kdecore (services) KMimeTypeFactory::parseMagic: Now parsing  "/usr/share/mime/magic"
kate(4256)/kdecore (services) KMimeTypeFactory::parseMagic: Now parsing  "/home/murz/.local/share/mime/magic"
kate(4256)/Kate (Document) KateBuffer::doHighlight: HIGHLIGHTED END --- NEED HL, LINESTART:  0  LINEEND:  0
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL UNTIL LINE:  0  MAX:  0
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL DYN COUNT:  0  MAX:  512
kate(4256)/Kate (Document) KateView::updateView: KateView::updateView
kate(4256)/Kate (Code Completion) KateCompletionWidget::abortCompletion:
kate(4256)/Kate (Document) KateBuffer::doHighlight: HIGHLIGHTED END --- NEED HL, LINESTART:  0  LINEEND:  0
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL UNTIL LINE:  0  MAX:  0
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL DYN COUNT:  0  MAX:  512
kate(4256)/Kate (Document) KateView::updateView: KateView::updateView
kate(4256)/Kate (Document) KateBuffer::doHighlight: HIGHLIGHTED END --- NEED HL, LINESTART:  0  LINEEND:  0
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL UNTIL LINE:  0  MAX:  0
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL DYN COUNT:  0  MAX:  512
kate(4256)/Kate (Document) KateBuffer::doHighlight: HIGHLIGHTED END --- NEED HL, LINESTART:  0  LINEEND:  0
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL UNTIL LINE:  0  MAX:  0
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL DYN COUNT:  0  MAX:  512
kate(4256)/Kate (Document) KateBuffer::doHighlight: HIGHLIGHTED END --- NEED HL, LINESTART:  0  LINEEND:  0
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL UNTIL LINE:  0  MAX:  0
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL DYN COUNT:  0  MAX:  512
kate(4256)/Kate (Document) KateBuffer::doHighlight: HIGHLIGHTED END --- NEED HL, LINESTART:  0  LINEEND:  0
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL UNTIL LINE:  0  MAX:  0
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL DYN COUNT:  0  MAX:  512
kate(4256)/Kate (Document) KateBuffer::doHighlight: HIGHLIGHTED END --- NEED HL, LINESTART:  0  LINEEND:  0
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL UNTIL LINE:  0  MAX:  0
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL DYN COUNT:  0  MAX:  512
kate(4256)/Kate (Document) KateView::updateView: KateView::updateView
kate(4256)/Kate (App) KateDocManager::slotDocumentNameChanged: docname changed:  "Untitled" -----> "test.txt"
kate(4256)/Kate (Document) KateBuffer::doHighlight: HIGHLIGHTED END --- NEED HL, LINESTART:  0  LINEEND:  0
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL UNTIL LINE:  0  MAX:  0
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL DYN COUNT:  0  MAX:  512
kate(4256)/Kate (Document) KateView::updateView: KateView::updateView
kate(4256)/Kate (Document) KateBuffer::doHighlight: HIGHLIGHTED END --- NEED HL, LINESTART:  0  LINEEND:  0
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL UNTIL LINE:  0  MAX:  0
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL DYN COUNT:  0  MAX:  512
kate(4256)/Kate (Document) KateView::updateView: KateView::updateView
kate(4256)/Kate (Document) KateBuffer::doHighlight: HIGHLIGHTED END --- NEED HL, LINESTART:  0  LINEEND:  0
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL UNTIL LINE:  0  MAX:  0
kate(4256)/Kate (Document) KateBuffer::doHighlight: HL DYN COUNT:  0  MAX:  512
kate(4256)/Kate (Document) KateView::updateView: KateView::updateView
kate(4256)/Kate (App) KateViewDocumentProxyModel::opened: QModelIndex(0,0,0x0,KateViewDocumentProxyModel(0x13d1020) )
kate(4256)/Kate (App) KateMainWindow::slotUpdateHorizontalViewBar: slotUpdateHorizontalViewBar()
kate(4256)/Kate (App) KateMainWindow::slotUpdateHorizontalViewBar: KateViewBar(0x1366f30) hiding container
kate(4256)/kio (KDirListerCache) KDirListerCache::slotFileDirty: "/home/murz/Documents/checkpoint_in_progress"
kate(4256)/kio (KDirListerCache) KDirListerCache::slotFileDirty: "/home/murz/Documents"
kate(4256)/kio (KDirListerCache) KDirListerCache::updateDirectory: KUrl("file:///home/murz/Documents")
kate(4256)/Kate (Document) KateView::slotLostFocus: KateView::slotLostFocus
kate(4256)/Kate (Code Completion) KateCompletionWidget::abortCompletion:



Main string is 'OPEN USES ENCODING:  "windows-1251"', but it opened in UTF-8 and I see "�������� ����" instead of text!
Comment 3 Dominik Haumann 2010-01-11 22:49:02 UTC
*** Bug 222180 has been marked as a duplicate of this bug. ***
Comment 4 Murz 2010-01-29 16:26:03 UTC
Problem is still here in KDE 4.4 RC2!
Comment 5 Murz 2010-02-11 12:27:58 UTC
Bug is still exist in KDE 4.4 release too!
Comment 6 Vovochka 2010-03-03 06:54:12 UTC
4.4.1
Bug still exists.
Comment 7 Christoph Cullmann 2010-03-07 11:42:36 UTC
Removed auto-detection for KDE 4.5, too buggy :(

The new basic idea is:

1. try standard encoding
2. if that not works out, try to detect encoding by BOM or use fallback
encoding (default is latin-15, can be changed in config dialog, for example to your wanted encoding)
Comment 8 Christoph Cullmann 2010-03-07 11:42:48 UTC
Removed auto-detection for KDE 4.5, too buggy :(

The new basic idea is:

1. try standard encoding
2. if that not works out, try to detect encoding by BOM or use fallback
encoding (default is latin-15, can be changed in config dialog, for example to your wanted encoding)
Comment 9 Murz 2010-03-08 07:19:32 UTC
It is'nt very buggy, it works very well for me!
It successfully detect unicode, cp1251, koi8-r, and etc.
Basic idea isn't help, because I have three encodings, but "default" and "fallback" are only two.
Can I get autodetection functionality via some separated package or patch in KDE 4.5?
Comment 10 Christoph Cullmann 2010-03-08 07:56:57 UTC
Could you provide me with 2-3 test files? I will look into the issue then once more, perhaps introducing the auto-detection as an interim step before using fallback encoding.
Comment 11 Christoph Cullmann 2010-03-08 07:57:13 UTC
Assigned to me ;)
Comment 12 Murz 2010-03-11 13:19:45 UTC
Created attachment 41530 [details]
test_cp1251.txt
Comment 13 Murz 2010-03-11 13:20:10 UTC
Created attachment 41531 [details]
test_koi8-r.txt
Comment 14 Murz 2010-03-11 13:20:34 UTC
Created attachment 41532 [details]
test_utf8.txt
Comment 15 Murz 2010-03-11 13:22:26 UTC
I have attached 3 files with text in different Cyrillic encoding, that very often used by me.
In KDE 4.3 I set encoding autodetection to "Cyrillic" and KDE succesfully detects it in all files.
But in KDE 4.4 I lost this functionality!
Comment 16 Christoph Cullmann 2010-03-11 19:28:20 UTC
My changes are post KDE 4.4.x, therefor they didn't cause this.
But I will have a look and try to get this stuff back for KDE 4.5, in a more reliable way. Thanks a lot for attaching the examples.
Comment 17 Christoph Cullmann 2010-03-11 20:21:13 UTC
SVN commit 1102076 by cullmann:

reintroduce encoding prober, now loading is a four step thingy
documented in code atm
CCBUG: 222195

already works for tests provided in bug, but yes, there will be again global config option
to alter prober type


 M  +17 -8     katetextbuffer.cpp  
 M  +28 -7     katetextloader.h  


WebSVN link: http://websvn.kde.org/?view=rev&revision=1102076
Comment 18 Christoph Cullmann 2010-03-11 21:27:48 UTC
SVN commit 1102099 by cullmann:

introduce encoding detection again, loading now works this way, first working phase will be last one :)

1. standard encoding or the one from filedialog/command line taken
2. encoding detection runs: BOM check, if that fails the selected prober runs, default "universal"
3. fallback encoding is used
4. again encoding from 1. is used, the file is loaded read-only, as encoding errors occured

BUG: 222195

fixes above bug, given standard encoding is utf-8 (fallback encoding doesn't matter), all attached test cases
are opened with right encoding (even if the detection is default == universal, but ok with "cyrillic" too)



 M  +2 -1      buffer/katetextbuffer.cpp  
 M  +20 -0     buffer/katetextbuffer.h  
 M  +0 -3      buffer/katetextloader.h  
 M  +21 -2     dialogs/katedialogs.cpp  
 M  +21 -4     dialogs/opensaveconfigwidget.ui  
 M  +4 -3      document/katebuffer.cpp  
 M  +74 -49    utils/kateconfig.cpp  
 M  +52 -6     utils/kateconfig.h  
 M  +9 -0      utils/kateglobal.cpp  
 M  +7 -1      utils/kateglobal.h  


WebSVN link: http://websvn.kde.org/?view=rev&revision=1102099
Comment 19 Christoph Cullmann 2010-03-11 21:55:29 UTC
SVN commit 1102106 by cullmann:

add unittests for cyrillic encoding probing
BUG: 222195


 M  +10 -0     CMakeLists.txt  
 A             cp1251.txt  
 A             cyrillic_utf8.txt  
 A             koi8-r.txt  


WebSVN link: http://websvn.kde.org/?view=rev&revision=1102106
Comment 20 Murz 2010-03-15 08:20:16 UTC
Created attachment 41648 [details]
bani_text_utf-8.txt
Comment 21 Murz 2010-03-15 08:20:44 UTC
Created attachment 41649 [details]
cooler_utf-8.html
Comment 22 Murz 2010-03-15 08:21:33 UTC
Created attachment 41650 [details]
joomla_cp1251.php
Comment 23 Murz 2010-03-15 08:21:54 UTC
Created attachment 41651 [details]
joomla_frontend_cp1251.php
Comment 24 Murz 2010-03-15 08:22:16 UTC
Created attachment 41652 [details]
joomla_template_cp1251.php
Comment 25 Murz 2010-03-15 08:22:50 UTC
Created attachment 41653 [details]
kde.ru_index_utf-8.html
Comment 26 Murz 2010-03-15 08:23:14 UTC
Created attachment 41654 [details]
lug_ivanovo_koi8-r.html
Comment 27 Murz 2010-03-15 08:24:54 UTC
Created attachment 41655 [details]
page.tpl_utf-8.php
Comment 28 Murz 2010-03-15 08:25:21 UTC
Created attachment 41656 [details]
qs_index_utf-8.html
Comment 29 Murz 2010-03-15 08:25:41 UTC
Created attachment 41657 [details]
ruskde_koi8-r.htm
Comment 30 Murz 2010-03-15 08:25:59 UTC
Created attachment 41658 [details]
sensi_koi8-r.html
Comment 31 Murz 2010-03-15 08:26:34 UTC
Created attachment 41659 [details]
ubuntuclub.ru_cp1251.html
Comment 32 Murz 2010-03-15 08:27:49 UTC
I search and add some files in utf-8, cp1251 and koi8-r cyrillic encoding for testing, hope it helps.