Bug 205261

Summary: can't properly specify multiple translations
Product: [Applications] parley Reporter: ancow <bugs>
Component: generalAssignee: Parley Developers <parley-devel>
Status: CONFIRMED ---    
Severity: wishlist CC: andxav, ansa.ansa, inge, pprkut
Priority: NOR    
Version: unspecified   
Target Milestone: ---   
Platform: Ubuntu   
OS: Linux   
Latest Commit: Version Fixed In:
Attachments: a layout with wide input field

Description ancow 2009-08-26 23:01:33 UTC
Version:           0.9.2 (using KDE 4.3.0)
OS:                Linux
Installed from:    Ubuntu Packages

Often a word from one language has multiple translations in another language that aren't synonymous. (e.g. the English "you" (singular) is translated into German as either "du" or "Sie"; the two translations are in no way synonymous.) In those cases one usually should know all possible translations.

In KVocTrain this was solved using a comma. This allowed for testing all(!) translations without having to remember the order in which they were entered.

In the current version of parley, one can abuse the synonyms function to at least get it to recognise both translations as correct, however one can't make sure that all possible translations were entered.

This missing feature has rendered parley a lot less useful to me than KVocTrain was.
Comment 1 Inge Wallin 2014-02-19 18:45:23 UTC
I see what you mean.  We need to take a closer look at this.
Comment 2 Ansa 2014-09-09 07:28:20 UTC
This is a duplicate of https://bugs.kde.org/show_bug.cgi?id=156224 Together, the two wishes have 47 votes.
Comment 3 Inge Wallin 2014-09-09 16:22:14 UTC
I have thought about this some and have come to the following conclusion that I would like to have some feedback on.  I don't like to use a comma for separation of translations because this could cause problems when the contents is a sentence instead of a word.

But how about using a semicolon ';'?  This is a pretty common separator for computer use and is almost never used in any real text.
Comment 4 Ansa 2014-09-10 19:47:19 UTC
You raise an interesting point. I do not have a definite opinion on this matter, but here are some thoughts:

It seems to me that for my files, there would be no problem if comma serves a double purpose as a translation equivalent separator and a punctuation mark. In general, where order is important, Parley would "think" that I have entered two translation equivalents (the first and second half of the sentence), by chance in the same order in which they are shown in the editor. No problem there. I do not think that it is likely that I would enter the parts of a sentence in wrong order (and if I do, there is a way to change the automatic marking).


Commas are usually also used for one thing where order is very important, and that is basic forms of words in inflective languages. While the Inflection practice is great for practicing full paradigms of a few words, it is also necessary to remember additional information about all items in the practice, and the Inflection practice is too exhaustive for that. For example:

English     come, came, come   (verb forms)
German   das Haus, -es, ä-er    (gender represented by article and two additional noun forms)
Latin         fēmina, æ F               (the genitive form and gender together imply which paradigm the word belongs to - you need to know these two bits of information to be able to produce the other forms, but you do not need to know all the remaining forms by heart)
Finnish    kaupun/ki, gin, kia, keja               (noun forms necessary and sufficient for deriving all other forms of the noun; I use the / to mark the place after which the respective endings are added, so this should be read as "kaupunki, kaupungin, kaupunkia, kaupunkeja")

Most of the time, I think remembering the right order is not a problem here either, but I see that it might be a bit more tricky than in the sentence case.

An example from my actual file is that of two translation equivalents with inflections given. The inflections are separated by a comma and the equivalents by semicolon:
* yhdeksik/kö, ön, köä, köjä; ys/i, in, iä, ejä       (These are Finnish names of the number nine.)
For this particular case, the suggested solution with separation by semicolon only would be the best one.


I also like to group translation equivalents, using comma for synonyms and semicolon for unrelated senses. However, I do this mostly on the "Known" language, so it is either used as a question, or it is used in Flashcard practice; either way, it is not really necessary for Parley to separate the line into a list:
* to act; to function, work, operate
Here, it would be best if both separators could be recognised.


Another interesting point is this: when the "correct" answer is shown, how will it be organised? Will it be reordered to mirror the user's answer more closely, listing missed equivalents at the end? Take the case of my item
* yhdeksik/kö, ön, köä, köjä; ys/i, in, iä, ejä
If I write
ys/i, in, iä, ejä; yhdeksik/ko, on, koa, koja
(making a mistake in the diacritics, and listing the items in a different order), it is much easier to recognise where the error is if the correct answer would be reordered. On the other hand, if there is some kind of automatic reordering, it becomes far more important to conceptually distinguish between punctuation marks and translation equivalent separators.


I checked my file and it does contain sentences with semicolons. They are not all too frequent, but not completely infrequent either.


Lastly, what about users with existing files? I think it is more likely that current users have files where translation equivalents are comma separated than semicolon separated.

As always, I am most fond of giving users the choice (by allowing them to list all separators that should be used as translation equivalent separators). In my case, I would probably choose both the comma and the semicolon and then pay close attention to the cases where they play a different role. I would be grateful for automatic reordering of parts of the answer, as the chances that I list items where order is important in some kind of wrong order are rather small.
Comment 5 ancow 2014-09-10 20:10:13 UTC
Just one question: why use either as a separator, from a user perspective? Why not choose a solution with multiple input fields? It does throw up a *real* problem, though:

You'll frequently run into situations where word A from language X translates into several words (e.g. B and C) from a different language Y. However, B will then go on to translate to A and D and C to A and E in X. The point of this is, that translations aren't reversible, so the assumption that they are is faulty from the beginning. What is really needed is a model where there are dictionaries for each language and the translations are represented by uni-directional connections between the words of each dictionary.

To propose a quick solution for the "both commas and semi-colons are used in translations" problem: that's what we have escape sequences for. Give the user a way to enter each alternative translation in a separate text input box, and if the box contains the separator character (sequence), escape it in the save file.
To deal with the migration, just give the new file a new version number, and if an old one is encountered, escape any separator characters/sequences found in the translations before loading.
Comment 6 Ansa 2014-09-11 07:30:16 UTC
I like the idea of storing the translation equivalent separator internally with the help of some kind of escape sequence. I am not sure about how multiple input fields would work in the editor - I prefer having everything really compact. In the spirit of giving the user complete control over things (and the developers some extra coding practice), what about this:

"Advanced settings" interface:
Use semicolon as a translation equivalent separator    By default/By default not
    Show checkbox in the editor for changing the default meaning of semicolon    Yes/No
    Apply the default setting to all items in the collection that have not been set yet      Go
Use comma as a translation equivalent separator        By default/By default not
    Show checkbox in the editor for changing the default meaning of comma    Yes/No
    Apply the default setting to all items in the collection that have not been set yet      Go (after pressing Go, the user is asked to confirm that the setting will be set to all items in the collection and it will only be possible to reset it manually for each single item)


In the editor, if the checkbox option would be set to Yes, there would be a small checkbox at the end of each input field (or two, one for commas and one for semicolons), that would specify whether semicolons/commas should be understood as translation equivalent separators FOR THAT ITEM. For items present in the collection before this feature was implemented, the checkbox would show a dashed check mark, meaning that the status of the item has not been set manually and the default will apply to it during practice. For new items, it would be preset to the default value.

In the data, there are three options for an item with respect to treating semicolons, and similar three options for treating commas:
   - the value of "semicolon_is_separator" is not set; semicolon is treated as described in the default option in the settings (if "By default" is selected, then semicolon should be treated as translation equivalent separator for that entry, otherwise not)
   - if the value of "semicolon_is_separator" is set to yes, then semicolon should be treated as separator in both cases, "By default" as well as "By default not"
   - similarly if the value of "semicolon_is_separator" is set to "no", the semicolon should not be treated as separator in either case


While I think this would accommodate well to different kinds of users, including users whose large existing collections already include semicolons and commas of both kinds (punctuation as well as separators), I am not sure if implementing it would be worth the effort.
Comment 7 ancow 2014-09-11 19:10:01 UTC
This is overly complicated and doesn't solve the basic problem - after all, you can practice phrases/sentences with parley as well as single words. There are a bunch of sentences that will contain both commas and semicolons.

Apart from that, cluttering the input field with other input options is not advisable since it will be extremely hard to understand and not be intuitive at all. Remember that you can't properly label checkboxes in an input field.

Using multiple input fields is comparatively simple: just add a "+" button to the right of any currently displayed input field (possibly to the left if the text orientation is right-to-left, somebody with more GUI design experience should comment on that) and add a new input field below the current one if the + button is pressed, giving it input focus.
This has the added advantage of being both intuitive and relatively easy to use (if you don't want your hand to leave the keyboard like me, just tab to the + button and press space/enter/whatever to activate it).


I sympathise with the wish to keep things compact, but not at the expense of usability.
Comment 8 Ansa 2014-09-11 21:26:05 UTC
I see how the way you suggest is far more intuitive while being only a little less compact. Thanks for the explanation.

I feel that so far the only serious argument against simply using semicolon as the translation equivalent separator is the worry about already existing collections. Implementation of the nice interface could then be postponed as a wish for future releases.
Comment 9 Inge Wallin 2014-09-11 21:59:03 UTC
What I think you all missed is that we need to represent all of this in the file format somehow. The UI in parley can be made to do whatever we want.  But we need to store it within the limits of the kvtml specification.

With the new file format we are designing, it will again be easier because we can take all of this into account to begin with. But right now we need to be backwards compatible with kvtml2 if this bug is to be solved this side of 2015.
Comment 10 Andreas 2014-09-12 15:36:23 UTC
Thanks for the  discussion.  I would like to add three comments about "," vs ";" in list items: the first for currently pre new file format, the second for the new file format and the third for a more flexible interpretation of separator.  I would like to add one comment about list ordering.

As a change to the current version Parley should use , as the primary separator and ; as the secondary separator.  When Parley expects a list of answers it should scan the answers.  If the answers do not contain commas then it should expect commas as the separator in written answers.  If the expected answer items contain commas then it should expect semi-colons as the separator.

This solution has the following advantages:
1. It requires no UI changes.
2. It requires no file format changes.
3. It is grammatically correct English.
4. It works intuitively as most English language writers expect it to work.

This solution has the following disadvantages:
1. It only works correctly/intuitively for English.  It doesn't work for languages that don't use comma and semi-colon in the same way as English.  It doesn't work for non-language subjects that assign some other meaning to commas and semi-colons.

For the future file format I have already proposed that we have two fields per language/subject: primary separator and secondary separator character.  They would default to comma and semi-colon.  This will work correctly with non-English subjects.  In light of Ansa's previous comment, I would generalize this to primary separator(s) and secondary separator(s).

Lastly, I think that it would be useful to offer the option to generously/non-strictly interpret separators. Allow the student to use any list separator not included in a list item for the question they are currently answering.  For example, after pre-scanning the expected answers in the list, allow the user to use any non-alphanumeric character that is not in the answer.  This would be ad-hoc per answer, but would always work.  It has the disadvantage that there is no feedback to the student to teach them the correct list separator to use for the given subject.   

Currently, Parley doesn't recognize a difference between ordered and unordered lists.  I think that for now, we should accept the answers in any order, but always show the answers in the order from the file, in case the order is significant.  

I think that in the new file format, there should be an option per list ordered/unordered.  Obviously, ordered lists require ordered answers and vice versa.
Comment 11 ancow 2014-09-12 16:24:09 UTC
I'll have to answer inline, since there are too many points:

> --- Comment #10 from Andreas <andxav@zoho.com> ---
> As a change to the current version Parley should use , as the primary
> separator and ; as the secondary separator.  When Parley expects a list of
> answers it should scan the answers.  If the answers do not contain commas
> then it should expect commas as the separator in written answers.  If the
> expected answer items contain commas then it should expect semi-colons as
> the separator.

This assumes that parley has a list of possible answers. It doesn't, which is what this bug/wish is about. Also, there should be no problem adding a application-global option to define primary/secondary separators for now, until this can be handled properly. So hardcoding comma and semicolon as separators shouldn't be necessary.
In the answers, you don't even need this whole separator nonsense - just do what KVocTrain did: offer an appropriate amount of input boxes for the possible answers.

> For the future file format I have already proposed that we have two fields
> per language/subject: primary separator and secondary separator character. 

Why have a separator at all in the XML file? Why not "simply" make it possible to link multiple translations?

As I mentioned before, the whole concept of viewing a translation as a one-to-one relationship is flawed. If you're already reworking this, why not do it properly this time?

> Lastly, I think that it would be useful to offer the option to
> generously/non-strictly interpret separators. Allow the student to use any
> list separator not included in a list item for the question they are
> currently answering.  For example, after pre-scanning the expected answers
> in the list, allow the user to use any non-alphanumeric character that is
> not in the answer.

This only works under the assumption that no incorrect alphanumeric character will ever be entered in a test. It would deny parley the ability to properly pinpoint and highlight mistakes. (Also, if parley just offered the appropriate amount of input boxes for the answers, this wouldn't be necessary in the first place.)

> Currently, Parley doesn't recognize a difference between ordered and
> unordered lists.  I think that for now, we should accept the answers in any
> order, but always show the answers in the order from the file, in case the
> order is significant.

In the context of multiple answers to a translation, I don't think there *can* be ordered lists. Do you have a counterexample?
Comment 12 Ansa 2014-09-12 21:15:33 UTC
What about this: In testing, parley provides the user with the possibility to fill up to as many boxes as there are comma-and-semicolon separated substrings of the translation. It is up to the user to fill only as many boxes as are needed, using commas and semicolons inside boxes if appropriate. (The boxes could initially be pretty small, only expanding when the user starts filling them - this would avoid cluttering the screen with unnecessary boxes, which could easily happen on smaller devices.) Parley then throws away the remaining (empty) boxes, and reorders the used boxes on the screen to match the correct solution - the correct solution is still shown in the order which is given in the collection. If order is important, the user will either fill everything into one box, or see that they gave wrong order and adjust the correct/false mark accordingly.

Advantages:
- no changes in existing collections: the files stay the same (-> compatibility with kvtml2 standard)
- works with lists of translation equivalents that are both comma and semicolon separated
- works with commas and semicolons that are not translation equivalents separators
- is also "forward compatible": if the new format allows for some unambiguous way of specifying and storing translation equivalents, then
      - the practice interface does not have to change
      - the only limitation for the new format is not to use unescaped commas and/or semicolons as the (unambiguous) separator
      - if the new format implements complex relationships between words in two languages (where A, B, C in L1 and X, Y in L2 correspond as follows: A->X,Y;   X -> A,B;   Y->A,C), users would be free to use either "the old way" of simply listing a comma/semicolon separated list, or to use the new GUI (in which it possibly could take more time to interlink the words as needed), and they could even use them simultaneously

Generalisation:
- let the user list all possible separators as an option - if that is compatible with the chosen solution in the new data format (e.g. if multiline items are implemented, I can imagine that some users would like to list one translation per line)
Comment 13 ancow 2014-09-12 21:50:38 UTC
In general, I like the idea. However, it causes problems:
 - Removing and adding boxes causes the user interface to move eratically, which would probably be annoying.
 - If Parley unconditionally removes input boxes whenever a separator is entered, placing an incorrect separator will remove the user's ability to give one correct answer (it may even remove an already entered answer, depending on the implementation).
 - If it only removes boxes whenever a separator was placed in the correct position, it is giving the user unacceptable hints about the correctness of the answer (basically marking the answer as (partially) correct before the user chooses to verify it).

OTOH, simply displaying separators+1 boxes may suggest an incorrect amount of answers, which isn't exactly good, either.

The bottom line is, in the current version, this problem is going to be difficult to solve cleanly, so compromises will have to be made, In the new version, there needs to be some mechanism by which the user can cleanly differentiate alternative answers while specifying them, so that Parley can use that information to do cool stuff... ;)
Comment 14 Ansa 2014-09-12 22:28:19 UTC
Created attachment 88684 [details]
a layout with wide input field
Comment 15 Ansa 2014-09-12 22:46:33 UTC
I am using a custom layout where the input field streches across the whole width of parley (as shown in the attachment). Inspired by this, I was imagining the practice like this:

In the beginning, stock tiny input boxes on the right side of the input line. Only the leftmost box is "open" and expanded across the whole width of the input line. The boxes do not react in any way when commas and semicolons are input. When the user presses tab, the border between the current box and the next box moves gracefully (or jumps?) to the end of the currently entered text, leaving some extra free space for later corrections. (The extra free space could be about one word long, which is a natural length of a tab... it should not feel too shaky.) Filled boxes accumulate on the left, empty boxes stay on the right and the box in focus is between them. On pressing shift tab or clicking into one of the already filled boxes, the input focus moves, the boxes stay the same. Only when corrections require more space than is provided in the box does the box expand and shift the boxes to its right - hopefully, this would not happen too often.

I was thinking that the remaining boxes on the right disappear when the user presses enter, but maybe it is not necessary. The input line stays the same and below it, a copy of the filled boxes is shown in the order that corresponds to the correct answer, to ease checking the solution.

Not to give away the right number of boxes, parley could determine at the beginning of the session what is the maximum number of boxes in that session, and use that number of boxes for all items. Users would quickly get used to having extra boxes, and would not be led astray by them.

Conversely, if we wanted to tell the user how many boxes he/she has left, without taking away the option to misplace a semicolon into one of them, boxes on the right could change colour (get a dark margin or something) when commas/semicolons are entered. Normally, coloured boxes should be left empty, but it would be possible to fill them if the user wants to.

With languages that are written from right to left, the setup would also be right to left, so that boxes would be filled in natural order (starting with the rightmost box).
Comment 16 ancow 2014-09-13 11:38:26 UTC
I like this idea (it may still have a little much movement, but what doesn't these days...). I'd propose one little amendment, though. If we don't want to hint at the total amount of answers, just always display one input box to the right. Whenever the user changes focus to it, activate it as proposed above and add a new tiny input box to the right, possibly with a nice little flying-in animation. This way, you don't need to deal with the user clicking on any input but the next, and the GUI doesn't seem as cluttered for questions with fewer/no alternative answers.

There's one problem that still needs to be addressed with this idea, though: in case of small screens/window sizes and/or multiple large answers, scrolling from left to right all the time would be fairly annoying - how do overflows get handled? Simply introducing a dynamic newline type of concept? Or should previous boxes be put on individual lines? What about the case where a single input is too large for the screen?
Comment 17 ancow 2014-09-13 11:44:41 UTC
Just to point this out, though: the main problem is still having a way to tell Parley about multiple translations during the vocabulary entering phase.
With the proposed workaround, this can be postponed until the next big version comes out, but it should be done properly then.