Bug 490441

Summary: Translation memory entries are counted as near 100% matches even if they‘re very different
Product: [Applications] lokalize Reporter: Karl Ove Hufthammer <karl>
Component: translation memoryAssignee: Simon Depiets <sdepiets>
Status: RESOLVED FIXED    
Severity: normal CC: aacid, fin-w, shafff
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Other   
OS: Linux   
Latest Commit: Version Fixed In:
Sentry Crash Report:

Description Karl Ove Hufthammer 2024-07-18 09:40:30 UTC
SUMMARY
In the latest Git master version, after the upgrade to KF5, the translation memory (TM) feature thinks almost all strings are almost a 100% match. For example, it thinks that this TM string

    Fingerprints can be used in place of a password when unlocking the screen and providing administrator permissions to applications and command-line programs that request them.<nl/>
    <nl/>
    Logging into the system with your fingerprint is not yet supported.

is a 99.33% match for

    Cannot change the account type to Standard unless there is at least one other Administrator account on the system. Without one, authentication would become impossible or require the insecure use of the root password.

The bug was introduced in this commit:

commit 341525d9f6e370f34e9efe0836fd24609488c61a (HEAD)
Author: Volker Krause <vkrause@kde.org>
Date:   Thu May 9 11:24:50 2024 +0200

    Port translation memory away from QRegExp



STEPS TO REPRODUCE
1. Add some translation files to the translation memory (Tools | Manage translation memory)
2. Open a PO file
3. Navigate to almost any entry.

OBSERVED RESULT
A match will almost certainly be shown the TM pane, and the match percentage will be absurdly high, often 99.33%. 

EXPECTED RESULT
The match percentage should be reasonable (and if it’s not higher than the threshold set in the settings, no matches should be shown).


SOFTWARE/OS VERSIONS
Operating System: openSUSE Tumbleweed 20240714
KDE Plasma Version: 6.1.2
KDE Frameworks Version: 6.4.0
Qt Version: 6.7.2
Kernel Version: 6.9.9-1-default (64-bit)
Graphics Platform: X11
Processors: 4 × Intel® Core™ i5-2500 CPU @ 3.30GHz
Memory: 15.6 GiB of RAM
Graphics Processor: NVIDIA GeForce GTX 1060 3GB/PCIe/SSE2
Manufacturer: MSI
Product Name: MS-7673
System Version: 1.0
Comment 1 Bug Janitor Service 2024-07-21 23:20:19 UTC
A possibly relevant merge request was started @ https://invent.kde.org/sdk/lokalize/-/merge_requests/134
Comment 2 Volker Krause 2024-07-22 16:14:29 UTC
Git commit 4aac9757c59109dd9953faf68c891ba53c08d2d7 by Volker Krause, on behalf of Finley Watson.
Committed on 22/07/2024 at 16:14.
Pushed by vkrause into branch 'master'.

Add option QRegularExpression::UnanchoredWildcardConversion to addPart and delPart diff regexes

They need to find unanchored matches i.e. matches beginning after, not just beginning at, the offset int given to delPart.match() and addPart.match().
This generates match percentages identical to prior to 341525d9f6e370f34e9efe0836fd24609488c61a when the bug was introduced where regexes almost never matched anything and so match percentages were always very high.

M  +2    -2    src/tm/jobs.cpp

https://invent.kde.org/sdk/lokalize/-/commit/4aac9757c59109dd9953faf68c891ba53c08d2d7
Comment 3 Volker Krause 2024-07-22 16:22:49 UTC
Git commit 6f62af9f3243e97b8adf65eb9ceb5f03ab2c91fc by Volker Krause.
Committed on 22/07/2024 at 16:19.
Pushed by vkrause into branch 'release/24.08'.

Add option QRegularExpression::UnanchoredWildcardConversion to addPart and delPart diff regexes

They need to find unanchored matches i.e. matches beginning after, not just beginning at, the offset int given to delPart.match() and addPart.match().
This generates match percentages identical to prior to 341525d9f6e370f34e9efe0836fd24609488c61a when the bug was introduced where regexes almost never matched anything and so match percentages were always very high.


(cherry picked from commit 4aac9757c59109dd9953faf68c891ba53c08d2d7)

7191eaf0 Add option QRegularExpression::UnanchoredWildcardConversion to addPart and delPart diff regexes

Co-authored-by: Finley Watson <fin-w@tutanota.com>

M  +2    -2    src/tm/jobs.cpp

https://invent.kde.org/sdk/lokalize/-/commit/6f62af9f3243e97b8adf65eb9ceb5f03ab2c91fc