Bug 58340 - Tokens to unicode symbols. Encoding and Decoding. X-symbol.
Tokens to unicode symbols. Encoding and Decoding. X-symbol.
 Status: RESOLVED INTENTIONAL None kate Applications general (show other bugs) 2.1 OpenSUSE Linux NOR wishlist --- KWrite Developers

 Reported: 2003-05-11 13:23 UTC by Jorge Adriano 2012-11-25 19:44 UTC (History) 3 users (show) bugs.kde.org cullmann michel.ludwig

Attachments
Screenshot of Xemacs with ProofGeneral (using X-Symbol). Isabelle source code. (34.09 KB, image/png)
 Jorge Adriano 2003-05-11 13:23:28 UTC Version: 2.1 (using KDE 3.1.0) Installed from: SuSE Compiler: gcc version 2.95.3 20010315 (SuSE) OS: Linux (i686) release 2.4.10-4GB It would be nice to have something similar to XEmacs X-Symbol package. http://x-symbol.sourceforge.net With it you can define and associate tokens to symbols - and encoding (token->symbol) and decoding (symbol->token) rules. So you can do things like typing "\alpha" and have the editor displaying an alpha. Abstracting from the defined language would also be interesting. There could be a "Kate" way of inserting the symbols, independent of the used token language (if any). Something like, "press special key"->"type alpha+return"->"display an alpha".This way you wouldn't have to know how each symbol is translated for each defined language, but only the Kate way of inserting those symbols. (This way of inserting unicode chars would be good to have even with no encoding/decoding to ascii) Some examples where it may be useful: [1] - Editing LaTeX documents I'm portuguese, so I'm always using isolatin (non-ascii) chars, I never use the proper LaTeX commands though. I type "ac Jorge Adriano 2003-05-11 13:30:07 UTC Created attachment 1524 [details] Screenshot of Xemacs with ProofGeneral (using X-Symbol). Isabelle source code. Jorge Adriano 2003-06-17 21:23:10 UTC *** This bug has been confirmed by popular vote. *** Aaron J. Seigo 2003-07-12 23:35:01 UTC doesn't this belong in the X11 level, just like multibyte encodings??  Christoph Cullmann 2003-07-22 20:37:40 UTC think too, that is no kate issue, but would need some deeper support in the input stuff  Jorge Adriano 2003-07-22 23:36:29 UTC Ok, I've wrote to much in the report, let me try and make things more clear. My wish report: 1. encode symbols (in the buffer) into tokens. 2. parse input and decode tokens into symbols. With (1) you can do things like use Unicode symbols in your text, and save it as ASCII. This is quite usefull in some programming/markup languages, for details see above. Wish (2) is meant as an inverse of (1). When opening your text, it should be parsed and tokens decoded and displayed as symbols. The parser should also work "as you type", that is decoding the symbols into tokens as you type them. This way it keeps the display consistent and it provides a mean to insert the symbols. For applications see the original report above. Now in the original report, for completude, I refered to another feature that would be useful. 3. Some way to insert Unicode symbols. It should work like: - press special key (cursor look changes) - type symbol name/press return (symbol is displayed) It would be quite useful in lots of situations, and great combined with wish (1), this way you wouldn't have to know the every different syntax you use for the tokens, you would have a standard way to insert them. (1) seems like Kate stuff to me, no problem about that IMO. (2) might need deeper input support like Christoph puts it (The "parse as you type" feature. Parsing when opening a text document or by pressing some key should offer no problem). If that is the case where should I report it? (3) like Aaron says, might belong in the X11 level (at least part of the implementation, as I think it would always need some KDE specific code), but so do transparencies, drop shadows, etc., and we have all kinds of hacks for faking that stuff becouse waiting for real support in X would take forever! Why not also do it for something that actually seems *useful*, and most probably would require much less effort. Again, maybe deeper input support might be needed but I have no idea where to report it.  Thiago Macieira 2003-07-23 01:26:57 UTC For setting those symbols, you can simply configure X already. For instance, whenever I hit the Compose key followed by c, o, I get the  Jorge Adriano 2003-07-23 10:56:56 UTC Oh I didn't know about that! I've googled for it now, but I didn't find much info do you have any documentation you can point me too? So, if that is implemented, (3) is done. I just thought I'd stress once more that (1) and (2) are the actual whishlist items. J.A.  Thiago Macieira 2003-07-23 12:04:07 UTC You have to edit your Compose keymap. Mine are: /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose /usr/X11R6/lib/X11/locale/iso8859-1/Compose Also note that you cannot enter Greek characters in the ISO-8859-1 keymap, for instance. So, as I said, you need to be in a UTF-8 or Greek environment.  Thorben Kröger 2003-10-04 11:11:48 UTC In the discussion on http://dot.kde.org/1065171869/ someone pointed to this bug report. I agree with what was said: "Will you implement a possibilty for unicode symbols to be shown as unicode symbols instead of \alpha commands? This would be a real benefit." I like Jorge Adriano's proposals (1) and (2). This would be particulary useful for using LaTeX. Kile would greatly benefit from this. The problem with LaTeX is that you cannot use UTF-8 Encoding. I know there is an effort to build a Unicode LaTeX (Lambda/Omega), but I'd prefer to use Latin1 only for portability reasons and Omega is still not released anyway. What's needed when you type \alpha is for kate to look into a character map (possibly provided by Kile) that tells Kate \alpha is an alias for the unicode-character alpha. It then displays the so-parsed text as unicode. When saving, it would just replace the unicode-characters with their respective LaTeX- counterparts. Isn't this just a simple matter of search and replace?  Jeroen Wijnhout 2003-10-04 11:24:34 UTC Subject: Re: Tokens to unicode symbols. Encoding and Decoding. X-symbol. On Saturday 04 October 2003 11:11, Thorben "Kr tadu 2003-10-05 21:09:00 UTC > The problem with LaTeX is that you cannot use UTF-8 Encoding. This is off-topic to this bug, but I use LaTeX _only_ with UTF-8 encoding. Just grab the unicode style from http://www.unruh.de/DniQ/latex/unicode/ and use \inputenc{utf8}. Works like a charm, though you need to keep a few things in mind: \frac a b is a short-cut for \frac{a}{b}, but \frac  Jorge Adriano 2003-10-11 20:23:44 UTC Thanks for the tip! Besides the X-Symbol package for xemacs, there is also something similar for vi http://www.vandenoever.info/software/vim/ (by Jos). It might be interesting to take a look at both. J.A.  Christoph Cullmann 2012-10-29 06:52:04 UTC Out of scope for Kate, please use UTF-8 file encoding for such stuff, this is really no longer necessary. Erik Quaeghebeur 2012-10-29 09:21:10 UTC (In reply to comment #13) > Out of scope for Kate, please use UTF-8 file encoding for such stuff, this > is really no longer necessary. It is still and always will be useful. Most comments talked about conversion between \alpha and α, which is only a sub-use-case. Nevertheless, even in this case, for reliable document exchange in the TeX world, UTF-8 cannot be assumed. Also, the number of symbols is enormous (see, e.g., http://detexify.kirelabs.org/symbols.html ), so accessing the UTF-8 symbols is problematic. Furthermore, sometimes the macros are not directly linked to a symbol, but are semantic and user-defined, e.g., \coefficient may be mapped to \alpha/α, \beta/β, ... So this is not only an encoding issue, rather, it is a more elaborate form of code-folding. Therefore I kindly request this bug be re-opened. Jorge Adriano 2012-10-29 11:03:34 UTC (In reply to comment #13) > Out of scope for Kate, please use UTF-8 file encoding for such stuff, this > is really no longer necessary. Hi Christoph, I understand your point. But even when using UTF-8, it is still useful to have a simple mechanism to input these characters. Rather than just having an abbreviation \alpha displayed as alpha, you could have it encoded to \alpha. Abbreviations are useful in lots of contexts, to simplify introducing certain tokens, words, or sentences. For reference see the emacs abbrev mode: http://www.emacswiki.org/AbbrevMode With that said, I still think that the original proposal is useful, as Erik already pointed out. Consider the LaTeX code $a_i$. It could be interesting to have the "_i" displayed in subscript (and possibly omit the "_"). You can't just substitute using unicode for this. It's a tool to make your code more readable, nothing else. Christoph Cullmann 2012-10-29 20:55:45 UTC Hmm, this abbreviations look like just the perfect use case of the "new" snippet implementation in KatePart. in kate.git master you get it, even apps like Kile that just embed the part can benefit from them. You can provide such "expanding" snippets that you get then in the auto-completion, would that be something to use for this? Else feel free to reopen the bug, but please change subject so something like this "Emacs like Abbrev" or so to make it more clear that it is not solely for the encoding of some chars. Thanks for the feedback, guess I misunderstood this a bit ;) Michel Ludwig 2012-10-29 21:23:23 UTC Just for the record, Kile has an abbreviation-expansion feature already, which can even be combined with launching scripts. And regarding the LaTeX symbols that were mentioned here, Kile provides an extensive symbol list from which one can lookup either the Unicode character or the corresponding LaTeX command. But regarding the overlay display of Unicode symbols (or "rendered" LaTeX commands), I think what you are looking for is something like the "Kile inline LaTeX preview" fork: http://the-user.org/post/kile-inline-preview http://gitorious.org/kileip/ Jorge Adriano 2012-11-25 19:41:19 UTC > You can provide such "expanding" snippets that you get then in the > auto-completion, would that be something to use for this? Yes, that is pretty good indeed :) > Else feel free to > reopen the bug, but please change subject so something like this "Emacs like > Abbrev" or so to make it more clear that it is not solely for the encoding > of some chars. > Thanks for the feedback, guess I misunderstood this a bit ;) Yes this should could be broken in different bugs because there are different features at play here. - One is abbreviations. This makes it easier to input certain tokens or sentences. - Another is 'alternative display'. To display certain tokens or sentences differently Note these two are somewhat related though, in that they require on the fly parsing and association of sentences with a certain encoding (producing an alternative text or an alternative display of this text). Jorge Adriano 2012-11-25 19:44:23 UTC (In reply to comment #17) Thanks for letting me know Michel, I had no idea. That's great :)