Bug 169118

Summary: terminology homogeneization on source text and target text based on a glossary file.
Product: [Applications] lokalize Reporter: mvillarino <mvillarino>
Component: generalAssignee: Simon Depiets <sdepiets>
Status: RESOLVED FIXED    
Severity: wishlist CC: adrian, sdepiets
Priority: NOR    
Version First Reported In: unspecified   
Target Milestone: ---   
Platform: Debian testing   
OS: Linux   
Latest Commit: Version Fixed/Implemented In:
Sentry Crash Report:

Description mvillarino 2008-08-14 13:24:59 UTC
Version:            (using KDE 4.1.0)
Installed from:    Debian testing/unstable Packages

This is closely related to wishlist 65031 (by a Finnish translator).

The idea is to have a check that new started, searches on each source text (msgid) for words or expressions on the glossary, and for each match, it do the same on target text (msgstr), finding. Then, it reports, or highlights those words on the source text that have not been properly translated.
Here "properly translation" means that if the source word appears on the glossary, then a translation for it, as contained on the glossary, must appear on the target text.

This is more complicated than it may seems, given that the glossary is supposed to contain some kind of "canonical form" of words and expresions, like infinitive verbs, so it is quite probably that a pre-pre-parsing of texts is needed, splitting out xml tags, shortcuts, and things like that, then a pre-parsing, which most possibly imply lemmatization of texts and glossary entries, and then the parsing itself, with word/expr matching.
Comment 1 Nick Shaforostoff 2008-09-14 01:42:02 UTC
I bet that the glossary will almost always contain entries that you would like to exclude from this kind of search.

I'm going to implement saving TM searches (search options: shell-like expressions for source, target, inversion bools for source and target independently, filemasks) and capability of running them all or their selection at once.

Then I'll add feature to generate list of searches based on a glossary (including falseFriends to-be-added-to-lokalize-tbx-editor), which can then be edited. As always it will be shareable and I'll encourage putting such qa-lists into lang dirs in svn repository.
Comment 2 Nick Shaforostoff 2008-09-14 01:49:15 UTC
ah, and search list generation  stage would of course use snowball stemmer  http://snowball.tartarus.org/
Comment 3 Adrián Chaves (Gallaecio) 2018-05-27 11:58:37 UTC
I feel that this would be better done by a further integration of Pology’s check-rules with Lokalize.
Comment 4 Simon Depiets 2018-09-29 11:09:25 UTC
Resolved by pology integration
https://phabricator.kde.org/D15759