Version: unspecified (using KDE 4.5.1)
[This relates to my work on an Okteta Structure Definition for Compound File Binary File Format, which is described in MS-CFB specification]
It would be nice to have ability to handle strings (including wide character strings). MS-CFB describes a 64 byte block, which is an array of up to 31 UTF-16 characters followed by a null terminator. I'd like to be able to show the text (as a unicode string), but the best I can do is an array of 32 elements of uint16, which isn't so easy to read.
<array name="Directory Entry Name" length="32">
<primitive type="UInt16" />
Some options that could be considered:
- treat it as an array (as I do now), but add an extra attribute that shows how it should be rendered (like display="number", display="string")
- add a new primitive type (like type="string") and some attributes for fixed length, null terminated, ucs2/utf8/ascii.
- add several new primitive types (like type="FixedLengthString", type="FixedLengthWideString", type="NullTerminatedString", type="NullTerminatedWideString")
The display attribute could be used to avoid conflicts between char and int8/uint8, but I recognise that this would still require backwards-compatibility hacks.
Steps to Reproduce:
Its probably easiest to see in a file with UTF-16 strings.
I can provide the .osd and .desktop that I'm working on, if required.
Display is a vertical display of the array in hex.
I'd prefer to show the string
Better support for strings is already on my TODO list.
I wanted to add support for null terminated ASCII C-strings, null terminated utf8,
null terminated and fixed length UTF16 and UTF32.
Probably Latin1 with configurable charset would also be a good idea.
I'm afraid I'm fairly busy the next weeks, but should have enough time to implement this in February.
SVN commit 1230377 by arichardson:
Add basic support for strings in structures.
Currently supported encodings are ASCII and UTF16-LE/BE.
Strings can be added to.osd by using the <string> element.
Strings can have a fixed length (byte count or character count),
be terminated by a certain unicode code point, or both (whichever occurs first)
M +6 -0 CMakeLists.txt
M +0 -6 view/structures/datatypes/abstractarraydatainformation.cpp
M +2 -3 view/structures/datatypes/abstractarraydatainformation.h
M +22 -11 view/structures/datatypes/datainformation.cpp
M +21 -10 view/structures/datatypes/datainformation.h
M +6 -1 view/structures/datatypes/datainformationbase.cpp
M +3 -2 view/structures/datatypes/datainformationbase.h
A view/structures/datatypes/dummydatainformation.cpp [License: LGPL]
A view/structures/datatypes/dummydatainformation.h [License: LGPL]
M +0 -5 view/structures/datatypes/dynamiclengtharraydatainformation.h
A view/structures/datatypes/strings (directory)
A view/structures/datatypes/strings/asciistringdata.cpp [License: LGPL]
A view/structures/datatypes/strings/asciistringdata.h [License: LGPL]
A view/structures/datatypes/strings/stringdata.cpp [License: LGPL]
A view/structures/datatypes/strings/stringdata.h [License: LGPL]
A view/structures/datatypes/strings/stringdatainformation.cpp [License: LGPL]
A view/structures/datatypes/strings/stringdatainformation.h [License: LGPL]
A view/structures/datatypes/strings/utf16stringdata.cpp [License: LGPL]
A view/structures/datatypes/strings/utf16stringdata.h [License: LGPL]
M +2 -2 view/structures/datatypes/topleveldatainformation.h
M +86 -8 view/structures/parsers/osdparser.cpp
M +3 -2 view/structures/parsers/osdparser.h
M +11 -10 view/structures/structtool.cpp
M +3 -0 view/structures/structtreemodel.cpp
WebSVN link: http://websvn.kde.org/?view=rev&revision=1230377
Closing this bug, since Latin1, UTF8 and UTF32 have been added by now
Thanks for this functionality, and all your other work too. Much appreciated.