| Summary: | Error while reading database file when Category name starts with numbers | ||
|---|---|---|---|
| Product: | [Applications] kphotoalbum | Reporter: | Victor Lobo <victor.ip.lobo> |
| Component: | Backend | Assignee: | KPhotoAlbum Bugs <kphotoalbum-bugs-null> |
| Status: | RESOLVED FIXED | ||
| Severity: | normal | CC: | johannes, tl |
| Priority: | NOR | ||
| Version First Reported In: | GIT master | ||
| Target Milestone: | --- | ||
| Platform: | openSUSE | ||
| OS: | Linux | ||
| Latest Commit: | https://invent.kde.org/graphics/kphotoalbum/-/commit/b6ce4c67d07903f38b850624e7b8f7a038fa0b8c | Version Fixed/Implemented In: | 6.1.0 |
| Sentry Crash Report: | |||
|
Description
Victor Lobo
2023-11-26 00:01:53 UTC
The issue here lies with the "compressed" file format using the category name as an XML attribute:
> <category name="2Review" ...>
> ....
> <image file="..." 2Review="test" ...>
That also means that one can create and use such a category in the "uncompressed" file format perfectly well and than convert the database to the "compressed" file format and totally break it.
To fix this on a database level, we need to change the database file format and break compatibility.
Alternatively, we could just prevent creating such a category - but that seems like a band-aid solution to me.
Since we are breaking compatibility anyways, I'm contemplating decoupling between the category name and the XML attribute name to avoid the whole escaping/unescaping mechanism altogether.
I'll need a few days to think about the which solution will be best to implement...
What about the following: We check all category names. Is we have one starting with a number, we force the "uncompressed" format, and we're done. Maybe we inform the user about that fallback (with a "don't show again" message box). Actually, we mix both variants anyway as soon as tagged areas are present. I'm not sure if our users actually know about those two variants and consciously choose one … and if the benefit is really worth keeping and maintaining both … Meanwhile, I know what's going on here, and I also think I know how to fix this :-) When the "compressed" file format is used, category names are used as XML attributes. To be able to do so, they are escaped. Our current escaping algorithm produces invalid XML attribute names, depending on the input: It (among other flaws) allows numbers to be the first character of the escaped output. This violates the XML spec (cf. https://www.w3.org/TR/xml/ ), which states that the first character of an XML attribute must be a NameStartChar. That is "a-z", "A-Z", ":" or "_". Numbers are allowed later in the attribute name, but not as the first character. When writing the XML file, the non-compliant attribute name is written nevertheless. When re-opening the database later, the data can't be read anymore though, because the parser finds a number where he expects either the end of the tag ("/>" or ">") or a new attribute (a NameStartChar), cf. the posted error message: "Expected '>' or '/', but got '[0-9]'" – and thus fails on the invalid XML. Just as a side note: The algorithm also can't escape non-Latin-1 characters correctly (they become "?"), and we also have problems with category names containing spaces and underscores when using the "readable" format, which aren't unescaped to what they initially were (all underscores are replaced by spaces and the underscores are lost on the next reading). The only way to fix the root cause for this is to implement a new escaping algorithm to escape category names to be used as XML attributes that respects the XML spec. My proposal for a compliant implementation can be found at https://invent.kde.org/graphics/kphotoalbum/-/tree/safe_xml_escaping?ref_type=heads – I use a modified URL-style percent encoding using QByteArray's integrated functionality. With this approach, not only the numbers issue is fixed, but one can also use the whole Unicode range in a category name. Also, the spaces and underscores issue is gone for the "readable" format. Needs testing though. Git commit b6ce4c67d07903f38b850624e7b8f7a038fa0b8c by Johannes Zarl-Zierl. Committed on 22/04/2025 at 21:58. Pushed by johanneszarl into branch 'master'. Simplify Changelog message Mention fixed bug: M +5 -1 CHANGELOG.md https://invent.kde.org/graphics/kphotoalbum/-/commit/b6ce4c67d07903f38b850624e7b8f7a038fa0b8c |