Bug 96416 - Opening file: Parsing error in the main document
Summary: Opening file: Parsing error in the main document
Status: RESOLVED FIXED
Alias: None
Product: kword
Classification: Miscellaneous
Component: general (show other bugs)
Version: 1.5 or before
Platform: Gentoo Packages Linux
: NOR normal
Target Milestone: ---
Assignee: Thomas Zander
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-01-06 03:50 UTC by Chris
Modified: 2008-12-21 00:55 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Test case: this file generates the mentioned error. (21.74 KB, application/octet-stream)
2005-01-08 02:30 UTC, Chris
Details
qdomtest.cpp (856 bytes, text/x-c++src)
2005-01-08 12:02 UTC, David Faure
Details
qdomtest.pro (289 bytes, text/plain)
2005-01-08 12:02 UTC, David Faure
Details
BZip2ed output of qdomtest on my machine (110 bytes, application/octet-stream)
2005-01-09 01:02 UTC, Chris
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Chris 2005-01-06 03:50:32 UTC
Version:           1.3.5 (using KDE KDE 3.3.2)
Installed from:    Gentoo Packages
Compiler:          gcc 3.3.4 20040623 (Gentoo Linux 3.3.4-r1, ssp-3.3.2-2, pie-8.7.6) 
OS:                Linux

When opening an existing .kwd file (even one created using the *same* *version* of KWord), I receive the following message:

Could not open <filename>

Reason: parsing error in the main document at line 6317, column 3

Error message: unexpected character

This has happened more than once now, although I don't think it happens with every single document. I would be happy to provide a test .kwd file which displays this error, if it would be useful, although it would be five pages long.

Note that this bug has already been reported to Gentoo Bugzilla at:

http://bugs.gentoo.org/show_bug.cgi?id=76722

and has been closed due to the problem apparently existing upstream (i.e. here).
Comment 1 Nicolas Goutte 2005-01-06 19:15:02 UTC
Can you unzip the file and see what character is in maindoc.xml at or near the given position (especially a non-UTF-8 character, a NUL charater or another character not allowed in XML)?

Also it would be nice if you could create a short document that has the problem. (Please attach it to this bug.)

Also perhaps another question: what Qt version?

Have a nice day!
Comment 2 Nicolas Goutte 2005-01-07 14:48:38 UTC
Thinking further: if you do not mind to attach your example filter to this bug report (making it public), you can do it too.

(Sorry, I misread your comment, as if you would not want to make it public but that does not seem to be what you have written.)

Have a nice day!
Comment 3 Chris 2005-01-08 02:30:31 UTC
Created attachment 8981 [details]
Test case: this file generates the mentioned error.

This version of the file actually gives line 5407 (not 6317), column 3, and the
text of the message is identical.
Comment 4 Chris 2005-01-08 02:32:29 UTC
I am running qt version 3.3.3 as compiled my Portage with USE flags reported as "+cups -debug -doc -firebird +gif -icc -immqt -immqt-bc -ipv6 +mysql -nas -odbc +opengl -postgres -sqlite -xinerama +zlib".
Comment 5 Chris 2005-01-08 02:42:25 UTC
I have unzipped the KWord file and maindoc.xml line 5407 is part of the list of spell-check words I added to the Ignore All list. The preceding line appears the be the problem. On this line, the word added to the Ignore All list was "hick" immediately followed by an emdash. The emdash was originally produced via the autocorrection feature and double-hyphen notation. The spell-check seems to insist that any word immediately followed by an emdash is actually one word (it includes the emdash in the word). However, in the <SPELLECHECKIGNOREWORD> element, the word attribute was not properly UTF-8 encoded. The result was an XML file that would not parse. The emdash character is actually properly encoded at a different place (in the document's actual text, it's encoded as 0xE2, 0x80, 0x94, but in the SPELLCHECKIGNOREWORD element, it's encoded as 0xE2 alone, which ends up being a lowercase A with circumflex in ISO-8859-1, and nothing at all in UTF-8).
Comment 6 Chris 2005-01-08 02:45:09 UTC
As a matter of fact, simply opening maindoc.xml in a hex editor (I know it's just text, but I wanted to be extremely careful) and removing the offending SPELLCHECKIGNOREWORD element allowed the document to be opened without incident!
Comment 7 David Faure 2005-01-08 12:02:25 UTC
Good find. However I double-checked the code and I see no reason for the XML to be written out as invalid UTF8.
Unless there's a bug in Qt.

Can you save the two attached files into a new empty directory, and then run
qmake -project
qmake
make
./qdomtest
and tell me what it outputs?
The test works here but maybe you're experiencing a bug in an older version of Qt
(which I doubt, though...). Ah and I'm using utf8 as my locale so this might hide a bug.



Created an attachment (id=8988)
qdomtest.cpp

Created an attachment (id=8989)
qdomtest.pro
Comment 8 Nicolas Goutte 2005-01-08 16:31:06 UTC
I cannot confirm that saving fails on KOffice CVS HEAD either, unfortunately. (I have not tested KOffice 1.3.x.)
Comment 9 Nicolas Goutte 2005-01-08 16:47:39 UTC
KOffice 1.3.x: no problem
The QDom test: no problem

So I have still no idea how to reproduce it. :-(

Comment 10 Chris 2005-01-09 01:02:13 UTC
Created attachment 8996 [details]
BZip2ed output of qdomtest on my machine

This file is a BZip2 of captured standard error output from qdomtest on my
system.
Comment 11 David Faure 2005-01-09 01:08:14 UTC
So the test works for you too, therefore I really can't think of a reason why KWord would have saved the thing wrongly.
Can you reproduce the bug, i.e. if you open the fixed document and re-add the
word to the ignore list using kword?

Comment 12 Nicolas Goutte 2005-01-09 09:10:31 UTC
On Sunday 09 January 2005 01:08, David Faure wrote:
(...)
> So the test works for you too, therefore I really can't think of a reason
> why KWord would have saved the thing wrongly. Can you reproduce the bug,
> i.e. if you open the fixed document and re-add the word to the ignore list
> using kword?

The only idea I have would be an ecoding mismatch between the ignored word in 
KSpell and QString. However I have not checked the code to know if there is 
really such a problem. (Also whithout autospellcheck, the "IgnoreAll" of 
KSpell seems not to add a word to the ignore list in CVS HEAD.)

Have a nice day!

Comment 13 Chris 2005-01-10 03:18:38 UTC
I can positively and reliably reproduce both behaviours: adding a word followed immediately by an emdash to the Ignore All list causes all subsequent opens to fail; unzipping the kwd file and removing the element manually allows the document to open normally.

In case it matters, I'm using ISpell with a British English dictionary and UTF-8 encoding, and my system is, for the most part, set up with UTF-8 internally (although I'm not really too knowledgeable about the business of actually making Linux do Unicode).

Finally, another note: although the Ignore All list is saved as part of the kwd file, simply adding a word to it does not actually cause the open file to marked as modified, so it doesn't get saved even if you click the Save button. To test this, I had to add a character to the end of the document to cause it to be marked modified and be able to be saved. This might be considered another bug (adding a word to the ignore list should be considered a document modification).

Thanks for all the work that no doubt goes into KOffice anyway. It's a good suite.
Comment 14 David Faure 2005-01-18 00:41:30 UTC
CVS commit by faure: 

Mark document as modified when adding a word to the "ignore list"
CCBUG: 96416


  M +1 -1      kwdoc.cc   1.633.2.8


--- koffice/kword/kwdoc.cc  #1.633.2.7:1.633.2.8
@@ -4548,5 +4548,5 @@ void KWDocument::addIgnoreWordAll( const
         m_spellListIgnoreAll.append( word );
     m_bgSpellCheck->addIgnoreWordAll( word );
-
+    setModified( true );
 }
 


Comment 15 dmoyne 2008-09-01 14:12:07 UTC
I confirm similar bug after saving a very small file with no error reported when trying to reopen the document I got this error message :
Could not open /home/dmoyne/my_data/COURRIER_data/Courrier_KWord/Bruno/Annulation_Saragosse.kwd
Reason: Parsing error in root at line 1, column 1
Error message: unexpected end of file

Regards