Bug 403668

Summary: All keywords/"Schlüsselbegriffe" removed from database with 5.4
Product: [Applications] kphotoalbum Reporter: Ingo <usenet>
Component: XML backendAssignee: KPhotoAlbum Bugs <kpabugs>
Status: RESOLVED FIXED    
Severity: grave CC: johannes
Priority: NOR    
Version: 5.4   
Target Milestone: ---   
Platform: openSUSE   
OS: Linux   
Latest Commit: Version Fixed In: 5.4.1
Sentry Crash Report:

Description Ingo 2019-01-27 20:20:45 UTC
SUMMARY
The automated upgrade to 5.4 destroyed all (!) my keyword-tags in the index.xml In my german version that's "Schlüsselbegiffe". I could assume that they were dropped because of the umlaut. At the same time the value of the "Schlüsselbegiffe" attribute changed from a numeric reference to the full text.

STEPS TO REPRODUCE
1. Use kphotoalbum with a German localization 
2. Wait for the 5.4 update ...
3. Panic

OBSERVED RESULT
One example line from an index.xml backup
<image file="Externe/Kasachstan/KZ_2003_0003.JPG" startDate="2003-08-03T09:02:54" md5sum="0756f21982c2845e4fc040818fe50ccc" width="2272" height="1704" Schlüsselbegriffe="83"/>
Same line after update w/o the "Schlüsselbegriffe":
<image file="Externe/Kasachstan/KZ_2003_0003.JPG" startDate="2003-08-03T09:02:54" md5sum="0756f21982c2845e4fc040818fe50ccc" width="2272" height="1704"/>

EXPECTED RESULT
<image file="Externe/Kasachstan/KZ_2003_0003.JPG" startDate="2003-08-03T09:02:54" md5sum="0756f21982c2845e4fc040818fe50ccc" width="2272" height="1704">
    <options>
        <option name="Schlüsselbegriffe">
            <value value="Kasachstan 2003"/>
        </option>
    </options>
</image>

SOFTWARE/OS VERSIONS
openSuSE 42.3
Linux/KDE Plasma: 
KDE Frameworks 5.32.0
Qt 5.6.2 (built against 5.6.2)
The xcb windowing system

ADDITIONAL INFORMATION
Tagging images with keywords is my main use-case for kphotoalbum. Thanks for the automatic backup of the index.
Comment 1 Johannes Zarl-Zierl 2019-01-28 09:34:08 UTC
Thanks for reporting this.

Just to be sure: are you upgrading from version 5.3?
Comment 2 Ingo 2019-01-28 12:51:17 UTC
@Johannes: To be honest, I'm not sure. 
I'm (still) running openSuSE 42.3 and applying all provided updates. The system is kept up to date, so I assume I was using 5.3 before.
The database itself was created about 10 years ago and has grown continuously in the meantime. The database has seen also older OS and kphotoalbum versions.

BTW: The section at the start of index.xml listing all Schlüsselbegriffe was still available, only the attributes at the images disappeared.
BTW2: If a keyword at the image does not match one entry in this list, the keyword is not offered in the GUI. (I wrote a sed script to fix the issue and this replaced some special characters in the keywords.)
Comment 3 Johannes Zarl-Zierl 2019-02-02 23:10:33 UTC
Using your excerpts from index.xml I can reproduce the issue now.

Regarding the change to a numeric reference: this is to be expected. Beginning with kphotoalbum 5.3 we now use the compressed file format by default (the uncompressed format has no benefit for most users and the compressed format is quite a bit faster, both file formats have been extensively tested and are stable and reliable).

Unfortunately, you seem to have found a bug that only occurs when a database file is converted from uncompressed to compressed file format. The snippet from backup file that you provided should usually read 'Schl_.FFFFFFFCsselbegriffe="83"', not 'Schlüsselbegriffe="83"'. Attribute values in XML are basically restricted to plain ASCII, which leads the XML parser to throw away the attribute.

I'll need to investigate how this inconsistency could be introduced into the index.xml file.

Meanwhile, if you search/replace all occurrences of "Schlüsselbegriffe" in the <image> attributes with "Schl_.FFFFFFFCsselbegriffe", I assume the problem is fixed for you?
Comment 4 Ingo 2019-02-03 21:16:10 UTC
I'm kind of happy to hear that my first analysis was going in the right direction, but I'm wondering that I seem to be the only one stumbling over this issue? Am I the only German user of kphotoalbum?

I fixed my database by a dynamically created sed script building valid xml-options for the "Schlüsselbegriffe" from the old database. A few hours of work, but that fixed it for me. Your approach would have been easier.

I've upgraded my system to OpenSuse Leap 15 in the meantime and noticed, that 5.3 is still the default version of kphotoalbum. To get 5.4 you have to use the optional Kde:Extra repository. Obviously I used Kde:Extra also with 42.3.
Comment 5 Johannes Zarl-Zierl 2019-02-03 22:15:30 UTC
Git commit b45dd424da08833004a081347cda62eaeb23eb72 by Johannes Zarl-Zierl.
Committed on 03/02/2019 at 22:00.
Pushed by johanneszarl into branch 'master'.

Flush name cache when changing database compression.

For performance reasons, FileReader::unescape and FileWriter::escape
cache the results of (un)escaping.
With the recent (introduced in v5.4) push to use the compressed file
format by default, kphotoalbum would load an uncompressed file and later
store it as a compressed one. In this process, the cache would be
populated for an uncompressed file and later used to store a compressed
file, producing incorrect results.
On the next attempted database load, these incorrect results may be
discarded if they are invalid XML attribute names, thus leading to data
loss.

M  +7    -4    XMLDB/FileReader.cpp
M  +8    -4    XMLDB/FileWriter.cpp

https://commits.kde.org/kphotoalbum/b45dd424da08833004a081347cda62eaeb23eb72
Comment 6 Johannes Zarl-Zierl 2019-02-03 22:25:39 UTC
> but I'm wondering that I seem to be the only one stumbling over this issue?
> Am I the only German user of kphotoalbum?

I can safely say that you are not (heck, two out of three active devs are German speaking).

My best guess would be that not that many users have upgraded to v5.4 yet, and out of those users only a subset does not already use the compressed file format and of those users only a subset uses non-English category names.

Still, we should have done a better job testing this seemingly innocent change :-|
Comment 7 Tobias Leupold 2019-02-04 19:38:42 UTC
Git commit c2ebf7edcfb47dd27f9b28974cd7b504db505a02 by Tobias Leupold, on behalf of Johannes Zarl-Zierl.
Committed on 04/02/2019 at 19:15.
Pushed by tleupold into branch 'v5.4-bugfix'.

Add testcase for bug #403668.

A  +58   -0    testcases/db/diacritical/compressed.orig.xml
A  +58   -0    testcases/db/diacritical/compressed.result.xml
A  +102  -0    testcases/db/diacritical/uncompressed-to-compressed.orig.xml
A  +58   -0    testcases/db/diacritical/uncompressed-to-compressed.result.xml
A  +102  -0    testcases/db/diacritical/uncompressed.orig.xml
A  +102  -0    testcases/db/diacritical/uncompressed.result.xml
A  +48   -0    testcases/integration-tests/check_diacritical.sh

https://commits.kde.org/kphotoalbum/c2ebf7edcfb47dd27f9b28974cd7b504db505a02
Comment 8 Tobias Leupold 2019-02-04 19:38:43 UTC
Git commit 2882e3c4fc942c521610f108eacd35de96226392 by Tobias Leupold, on behalf of Johannes Zarl-Zierl.
Committed on 04/02/2019 at 19:15.
Pushed by tleupold into branch 'v5.4-bugfix'.

Flush name cache when changing database compression.

For performance reasons, FileReader::unescape and FileWriter::escape
cache the results of (un)escaping.
With the recent (introduced in v5.4) push to use the compressed file
format by default, kphotoalbum would load an uncompressed file and later
store it as a compressed one. In this process, the cache would be
populated for an uncompressed file and later used to store a compressed
file, producing incorrect results.
On the next attempted database load, these incorrect results may be
discarded if they are invalid XML attribute names, thus leading to data
loss.

M  +7    -4    XMLDB/FileReader.cpp
M  +8    -4    XMLDB/FileWriter.cpp

https://commits.kde.org/kphotoalbum/2882e3c4fc942c521610f108eacd35de96226392