Bug 79682

Summary: When searching for words in msgstr there should be an option to ignore some characters
Product: kbabel Reporter: A Al-Arfaj <aalarfaj>
Component: generalAssignee: Stanislav Visnovsky <visnovsky>
Status: RESOLVED UNMAINTAINED    
Severity: wishlist CC: cfeck, sanderkoning
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Mandrake RPMs   
OS: Linux   
Latest Commit: Version Fixed In:

Description A Al-Arfaj 2004-04-15 11:06:51 UTC
Version:            (using KDE KDE 3.2.1)
Installed from:    Mandrake RPMs
OS:          Linux

The search tool in kbabel gives you the option of ignoring the accelerator mark, when possible.

It would be nice if there was another option where the user could specify a list of characters to ignore. This would be helpful especially when searching for words in Arabic, as it would give us the option to ignore inflections on the words. (Inflections are optional in a word, sometimes they are written and sometimes they are not, but when they are, the search tool does not recognize this and considers it a different word).

Hopefully this will be included as a seperate option from the "ignore accelerator mark" option.
Comment 1 A Al-Arfaj 2004-04-19 07:40:53 UTC
I might add that what I mean by inflections is composing characters. Arabic uses a lot of them which are not really essential (but they're nice when used). One example of this is the "fatha" (0x064E). Here is an example in UTF-8:
صور
صوَر
Here these two are the same word, but the second one is more decorated, using the "fatha" on the middle character. It would be nice if the search tool would just ignore this composing character and still consider them the same word.

You don't need to hardwire which characters should be ignored. It would be nice if the search tool would give us an open option where we could specify which composing characters to ignore. Thank you.
Comment 2 Sander Koning 2005-03-05 10:49:28 UTC
This could be very handy for other purposes as well.
I've already had numerous occasions where I could have had a correct automatic translation, if there had not been an extra ':' at the end of one of the two items. An option to ignore the colon (in this case) would make life a lot easier for those messages where an exact translation does exist, apart from an extra (or missing) punctuation character.
Comment 3 Nick Shaforostoff 2007-12-02 18:23:58 UTC
why not just use regex feature?

from wikipedia:
'For example, the set containing the three strings "Handel", "Händel", and "Haendel" can be described by the pattern H(ä|ae?)ndel'

regarding Comment #2: kaider's batch translation implementation fuzzy-translates such strings (lacking or having additional punct symbol). (they have 99% score)
Comment 4 Munzir Taha 2007-12-03 05:21:05 UTC
Shaforostoff, can you please suggest a regex to ignore diacritics so that

Input: ض
Matches: ضَ ضُ ضٌ ضْ ضِ ضٍ ض
Input: a
Matches: a à á â ã ä å
Input: u
Matches: ù ú û ü ũ ū ŭ ů ű ų
Input: r
Matches: r ŕ ŗ ř
?
I am not sure if this is possible with regex, but I know it won't be easy for a normal user to figure it out. On all MS applications, there is an option on the Find or Replace dialog boxes to ignore diacritics. It's like ignoring Case, even if possible with regex, it's badly needed that they put an easy way to access it.
Comment 5 Christoph Feck 2011-06-26 14:10:15 UTC
KBabel is no longer maintained, please use the KDE 4 translator's tool called
"Lokalize" instead. For more information, please visit
http://userbase.kde.org/Lokalize

If this is a request for a feature which is also missing in Lokalize, please add a comment so that I can reassign the request to the Lokalize authors. You could also file a new request for Lokalize.