Bug 58340 - Tokens to unicode symbols. Encoding and Decoding. X-symbol.
Summary: Tokens to unicode symbols. Encoding and Decoding. X-symbol.
Status: RESOLVED INTENTIONAL
Alias: None
Product: kate
Classification: Applications
Component: general (show other bugs)
Version: 2.1
Platform: openSUSE Linux
: NOR wishlist
Target Milestone: ---
Assignee: KWrite Developers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-05-11 13:23 UTC by Jorge Adriano
Modified: 2012-11-25 19:44 UTC (History)
3 users (show)

See Also:
Latest Commit:
Version Fixed In:


Attachments
Screenshot of Xemacs with ProofGeneral (using X-Symbol). Isabelle source code. (34.09 KB, image/png)
2003-05-11 13:30 UTC, Jorge Adriano
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jorge Adriano 2003-05-11 13:23:28 UTC
Version:           2.1 (using KDE 3.1.0)
Installed from:    SuSE
Compiler:          gcc version 2.95.3 20010315 (SuSE)
OS:          Linux (i686) release 2.4.10-4GB

It would be nice to have something similar to XEmacs X-Symbol package.
http://x-symbol.sourceforge.net

With it you can define and associate tokens to symbols - and encoding 
(token->symbol) and decoding (symbol->token) rules. So you can do things like 
typing "\alpha" and have the editor displaying an alpha.

Abstracting from the defined language would also be interesting. There could be 
a "Kate" way of inserting the symbols, independent of the used token language 
(if any). Something like, "press special key"->"type alpha+return"->"display an 
alpha".This way you wouldn't have to know how each symbol is translated for each 
defined language, but only the Kate way of inserting those symbols.
(This way of inserting unicode chars would be good to have even with no 
encoding/decoding to ascii)

Some examples where it may be useful:

[1] - Editing LaTeX documents
I'm portuguese, so I'm always using isolatin (non-ascii) chars, I never 
use the proper LaTeX commands though. I type "ac
Comment 1 Jorge Adriano 2003-05-11 13:30:07 UTC
Created attachment 1524 [details]
Screenshot of Xemacs with ProofGeneral (using X-Symbol). Isabelle source code.
Comment 2 Jorge Adriano 2003-06-17 21:23:10 UTC
*** This bug has been confirmed by popular vote. ***
Comment 3 Aaron J. Seigo 2003-07-12 23:35:01 UTC
doesn't this belong in the X11 level, just like multibyte encodings?? 
Comment 4 Christoph Cullmann 2003-07-22 20:37:40 UTC
think too, that is no kate issue, but would need some deeper support in the input stuff 
Comment 5 Jorge Adriano 2003-07-22 23:36:29 UTC
Ok, I've wrote to much in the report, let me try and make things more clear. 
My wish report: 
1. encode symbols (in the buffer) into tokens. 
2. parse input and decode tokens into symbols. 
 
With (1) you can do things like use Unicode symbols in your text, and save it as ASCII. 
This is quite usefull in some programming/markup languages, for details see above. 
Wish (2) is meant as an inverse of (1). When opening your text, it should be parsed and 
tokens decoded and displayed as symbols.  
The parser should also work "as you type", that is decoding the symbols into tokens as you 
type them. This way it keeps the display consistent and it provides a mean to insert the 
symbols. 
For applications see the original report above. 
 
Now in the original report, for completude, I refered to another feature that would be useful. 
3. Some way to insert Unicode symbols. It should work like:  
- press special key (cursor look changes) 
- type symbol name/press return (symbol is displayed) 
It would be quite useful in lots of situations, and great combined with wish (1), this way you 
wouldn't have to know the every different syntax you use for the tokens, you would have a 
standard way to insert them. 
 
(1) seems like Kate stuff to me, no problem about that IMO. 
(2) might need deeper input support like Christoph puts it (The "parse as you type" feature. 
Parsing when opening a text document or by pressing some key should offer no problem). 
If that is the case where should I report it? 
 
(3) like Aaron says, might belong in the X11 level (at least part of the implementation, as I 
think it would always need some KDE specific code), but so do transparencies, drop shadows, 
etc., and we have all kinds of hacks for faking that stuff becouse waiting for real support in X 
would take forever! Why not also do it for something that actually seems *useful*, and most 
probably would require much less effort. 
Again, maybe deeper input support might be needed but I have no idea where to report it. 
 
Comment 6 Thiago Macieira 2003-07-23 01:26:57 UTC
For setting those symbols, you can simply configure X already. For instance, 
whenever I hit the Compose key followed by c, o, I get the 
Comment 7 Jorge Adriano 2003-07-23 10:56:56 UTC
Oh I didn't know about that!  
I've googled for it now, but I didn't find much info do you have any documentation you can 
point me too? 
 
So, if that is implemented, (3) is done. 
I just thought I'd stress once more that (1) and (2) are the actual whishlist items. 
 
J.A. 
 
 
Comment 8 Thiago Macieira 2003-07-23 12:04:07 UTC
You have to edit your Compose keymap. Mine are: 
 
/usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose 
/usr/X11R6/lib/X11/locale/iso8859-1/Compose 
 
Also note that you cannot enter Greek characters in the ISO-8859-1 
keymap, for instance. So, as I said, you need to be in a UTF-8 or 
Greek environment. 
Comment 9 Thorben Kröger 2003-10-04 11:11:48 UTC
In the discussion on http://dot.kde.org/1065171869/ someone pointed to this bug report. I 
agree with what was said: "Will you implement a possibilty for unicode symbols to be 
shown as unicode symbols instead of \alpha commands? This would be a real benefit." 
 
I like Jorge Adriano's proposals (1) and (2). This would be particulary useful for using LaTeX. 
Kile would greatly benefit from this. 
 
The problem with LaTeX is that you cannot use UTF-8 Encoding. I know there is an effort to 
build a Unicode LaTeX (Lambda/Omega), but I'd prefer to use Latin1 only for portability 
reasons and Omega is still not released anyway. 
 
What's needed when you type \alpha is for kate to look into a character map (possibly 
provided by Kile) that tells Kate \alpha is an alias for the unicode-character alpha. It then 
displays the so-parsed text as unicode. 
When saving, it would just replace the  unicode-characters with their respective LaTeX-
counterparts. 
 
Isn't this just a simple matter of search and replace? 
Comment 10 Jeroen Wijnhout 2003-10-04 11:24:34 UTC
Subject: Re:  Tokens to unicode symbols. Encoding and Decoding.
 X-symbol.

On Saturday 04 October 2003 11:11, Thorben "Kr
Comment 11 tadu 2003-10-05 21:09:00 UTC
> The problem with LaTeX is that you cannot use UTF-8 Encoding. 
 
This is off-topic to this bug, but I use LaTeX _only_ with UTF-8 encoding. 
Just grab the unicode style from http://www.unruh.de/DniQ/latex/unicode/ and 
use \inputenc{utf8}. Works like a charm, though you need to keep a few things 
in mind: \frac a b is a short-cut for \frac{a}{b}, but \frac 
Comment 12 Jorge Adriano 2003-10-11 20:23:44 UTC
Thanks for the tip! 
Besides the X-Symbol package for xemacs, there is also something similar for 
vi http://www.vandenoever.info/software/vim/ (by Jos). It might be interesting 
to take a look at both. 
 
J.A. 
  
  
Comment 13 Christoph Cullmann 2012-10-29 06:52:04 UTC
Out of scope for Kate, please use UTF-8 file encoding for such stuff, this is really no longer necessary.
Comment 14 Erik Quaeghebeur 2012-10-29 09:21:10 UTC
(In reply to comment #13)
> Out of scope for Kate, please use UTF-8 file encoding for such stuff, this
> is really no longer necessary.

It is still and always will be useful. Most comments talked about conversion between \alpha and α, which is only a sub-use-case. Nevertheless, even in this case, for reliable document exchange in the TeX world, UTF-8 cannot be assumed. Also, the number of symbols is enormous (see, e.g., http://detexify.kirelabs.org/symbols.html ), so accessing the UTF-8 symbols is problematic. Furthermore, sometimes the macros are not directly linked to a symbol, but are semantic and user-defined, e.g., \coefficient may be mapped to \alpha/α, \beta/β, ... So this is not only an encoding issue, rather, it is a more elaborate form of code-folding.

Therefore I kindly request this bug be re-opened.
Comment 15 Jorge Adriano 2012-10-29 11:03:34 UTC
(In reply to comment #13)
> Out of scope for Kate, please use UTF-8 file encoding for such stuff, this
> is really no longer necessary.

Hi Christoph, 

I understand your point. But even when using UTF-8, it is still useful to have a simple mechanism to input these characters. Rather than just having an abbreviation \alpha displayed as alpha, you could have it encoded to \alpha. Abbreviations are useful in lots of contexts, to simplify introducing certain tokens, words, or sentences. 

For reference see the emacs abbrev mode: 
http://www.emacswiki.org/AbbrevMode

With that said, I still think that the original proposal is useful, as Erik already pointed out. Consider the LaTeX code $a_i$. It could be interesting to have the "_i" displayed in subscript (and possibly omit the "_"). You can't just substitute using unicode for this. It's a tool to make your code more readable, nothing else.
Comment 16 Christoph Cullmann 2012-10-29 20:55:45 UTC
Hmm, this abbreviations look like just the perfect use case of the "new" snippet implementation in KatePart.
in kate.git master you get it, even apps like Kile that just embed the part can benefit from them.
You can provide such "expanding" snippets that you get then in the auto-completion, would that be something to use for this? Else feel free to reopen the bug, but please change subject so something like this "Emacs like Abbrev" or so to make it more clear that it is not solely for the encoding of some chars.
Thanks for the feedback, guess I misunderstood this a bit ;)
Comment 17 Michel Ludwig 2012-10-29 21:23:23 UTC
Just for the record, Kile has an abbreviation-expansion feature already, which can even be combined with launching scripts. 

And regarding the LaTeX symbols that were mentioned here, Kile provides an extensive symbol list from which one can lookup either the Unicode character or the corresponding LaTeX command.

But regarding the overlay display of Unicode symbols (or "rendered" LaTeX commands), I think what you are looking for is something like the "Kile inline LaTeX preview" fork:

http://the-user.org/post/kile-inline-preview
http://gitorious.org/kileip/
Comment 18 Jorge Adriano 2012-11-25 19:41:19 UTC
> You can provide such "expanding" snippets that you get then in the
> auto-completion, would that be something to use for this? 

Yes, that is pretty good indeed :) 

> Else feel free to
> reopen the bug, but please change subject so something like this "Emacs like
> Abbrev" or so to make it more clear that it is not solely for the encoding
> of some chars.
> Thanks for the feedback, guess I misunderstood this a bit ;)

Yes this should could be broken in different bugs because there are different features at play here. 

- One is abbreviations. This makes it easier to input certain tokens or sentences. 
- Another is  'alternative display'. To display certain tokens or sentences differently

Note these two are somewhat related though, in that they require on the fly parsing and association of sentences with a certain encoding (producing an alternative text or an alternative display of this text).
Comment 19 Jorge Adriano 2012-11-25 19:44:23 UTC
(In reply to comment #17)

Thanks for letting me know Michel, I had no idea.
That's great :)