Bug 263489

Summary: Add support for (wide) character strings
Product: [Applications] okteta Reporter: Brad Hards <bradh>
Component: Structures ToolAssignee: Alex Richardson <arichardson.kde>
Status: RESOLVED FIXED    
Severity: wishlist CC: kossebau
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Ubuntu   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Brad Hards 2011-01-17 23:58:57 UTC
Version:           unspecified (using KDE 4.5.1) 
OS:                Linux

[This relates to my work on an Okteta Structure Definition for Compound File Binary File Format, which is described in MS-CFB specification]

It would be nice to have ability to handle strings (including wide character strings). MS-CFB describes a 64 byte block, which is an array of up to 31 UTF-16 characters followed by a null terminator. I'd like to be able to show the text (as a unicode string), but the best I can do is an array of 32 elements of uint16, which isn't so easy to read.

<array name="Directory Entry Name" length="32">
  <primitive type="UInt16" />
</array>

Some options that could be considered:
 - treat it as an array (as I do now), but add an extra attribute that shows how it should be rendered (like display="number", display="string")
 - add a new primitive type (like type="string") and some attributes for fixed length, null terminated, ucs2/utf8/ascii.
 - add several new primitive types (like type="FixedLengthString", type="FixedLengthWideString", type="NullTerminatedString", type="NullTerminatedWideString")

The display attribute could be used to avoid conflicts between char and int8/uint8, but I recognise that this would still require backwards-compatibility hacks.


Reproducible: Always

Steps to Reproduce:
Its probably easiest to see in a file with UTF-16 strings.

I can provide the .osd and .desktop that I'm working on, if required. 

Actual Results:  
Display is a vertical display of the array in hex.

Expected Results:  
I'd prefer to show the string
Comment 1 Alex Richardson 2011-01-18 01:03:54 UTC
Better support for strings is already on my TODO list.
I wanted to add support for null terminated ASCII C-strings, null terminated utf8,
null terminated and fixed length UTF16 and UTF32.

Probably Latin1 with configurable charset would also be a good idea.

I'm afraid I'm fairly busy the next weeks, but should have enough time to implement this in February.
Comment 2 Alex Richardson 2011-05-04 13:53:03 UTC
SVN commit 1230377 by arichardson:

Add basic support for strings in structures.
Currently supported encodings are ASCII and UTF16-LE/BE.
Strings can be added to.osd by using the <string> element. 
Strings can have a fixed length (byte count or character count), 
be terminated by a certain unicode code point, or both (whichever occurs first)


CCBUG: 263489

 M  +6 -0      CMakeLists.txt  
 M  +0 -6      view/structures/datatypes/abstractarraydatainformation.cpp  
 M  +2 -3      view/structures/datatypes/abstractarraydatainformation.h  
 M  +22 -11    view/structures/datatypes/datainformation.cpp  
 M  +21 -10    view/structures/datatypes/datainformation.h  
 M  +6 -1      view/structures/datatypes/datainformationbase.cpp  
 M  +3 -2      view/structures/datatypes/datainformationbase.h  
 A             view/structures/datatypes/dummydatainformation.cpp   [License: LGPL]
 A             view/structures/datatypes/dummydatainformation.h   [License: LGPL]
 M  +0 -5      view/structures/datatypes/dynamiclengtharraydatainformation.h  
 A             view/structures/datatypes/strings (directory)  
 A             view/structures/datatypes/strings/asciistringdata.cpp   [License: LGPL]
 A             view/structures/datatypes/strings/asciistringdata.h   [License: LGPL]
 A             view/structures/datatypes/strings/stringdata.cpp   [License: LGPL]
 A             view/structures/datatypes/strings/stringdata.h   [License: LGPL]
 A             view/structures/datatypes/strings/stringdatainformation.cpp   [License: LGPL]
 A             view/structures/datatypes/strings/stringdatainformation.h   [License: LGPL]
 A             view/structures/datatypes/strings/utf16stringdata.cpp   [License: LGPL]
 A             view/structures/datatypes/strings/utf16stringdata.h   [License: LGPL]
 M  +2 -2      view/structures/datatypes/topleveldatainformation.h  
 M  +86 -8     view/structures/parsers/osdparser.cpp  
 M  +3 -2      view/structures/parsers/osdparser.h  
 M  +11 -10    view/structures/structtool.cpp  
 M  +3 -0      view/structures/structtreemodel.cpp  


WebSVN link: http://websvn.kde.org/?view=rev&revision=1230377
Comment 3 Alex Richardson 2011-08-13 21:15:38 UTC
Closing this bug, since Latin1, UTF8 and UTF32 have been added by now
Comment 4 Brad Hards 2011-08-13 23:41:26 UTC
Thanks for this functionality, and all your other work too. Much appreciated.