SUMMARY Latest development KBibTeX (kbibtex-git-r3368.52ab3ba1-1) does not store every encoding set in the file settings. STEPS TO REPRODUCE 1. Create a file : ----- encoding.bib ------- @comment{x-kbibtex-encoding=us-ascii} @article{encodingfail2020, author = {Author}, journal = {Journal}, title = {Encoding with special character $\mu$}, year = {2020} } ---------------------------- 2. Open the file in KBibTeX - The Encoding is correctly set to US-ASCII in the file settings - The special escaped LaTeX character \mu is replaced with µ 3. Simulate changing something so that KBibTeX allows saving the file 4. Save the file OBSERVED RESULT The encoding line @comment{x-kbibtex-encoding=us-ascii} Is removed from the file, causing future openings in KBibTeX to interpret the file as UTF-8. Future open/save cycles are stable, leading to the following file: ------------ encoding.bib -------------- @comment{x-kbibtex-encoding=utf-8} @article{encodingfail2020, author = {Author}, journal = {Journal}, title = {Encoding with special character $\ensuremath{μ}$}, year = {2020} } ----------------------------------------- EXPECTED RESULT KBibTeX keeps the file in US-ASCII so there are no problems with old BibTeX uncapable of handling Unicode. SOFTWARE/OS VERSIONS Up-to-date Manjaro, KBibTeX built from the kbibtex-git AUR package
BTW kbibtex stable 0.9.2 does not exhibit this bug.
Furthermore, if one resets the file encoding to US-ASCII by hand each time when opening the file, there is an \ensuremath{...} runaway on each open/save cycle, causing the file to blow up like this: @article{encodingfail2020, author = {Author}, journal = {Journal}, title = {Encoding with special character $\ensuremath{\ensuremath{\ensuremath{\ensuremath{\mu}}}}$}, year = {2020} }
Sorry for the late response. This bug report documents actually two problems: 1. The (mis)handling of "US-ASCII" -- and -- 2. Writing $\ensuremath{μ}$ instead of $\mu$. For the first problem, I have decided to remove US-ASCII from the list of encodings as it is redundant. You have effectively US-ASCII if you choose either UTF-8 or LaTeX (and restrict yourself to characters defined in US-ASCII). However, whereas US-ASCII cannot handle 'Ä' or 'Æ', both UTF-8 and 'LaTeX' can in their own ways. The second problem was much about the 'LaTeX encoder' failing to recognize that there was already a math environment from the dollar signs and thus \ensuremath was unnecessary. The issue with μ is more complex than it seems, as on the LaTeX side there is \mu, \textmu, and \textmugreek and on the Unicode side there is the Greek letter μ (U+03BC), the 'micro' symbol 'µ' (U+00B5), and some more special 'mu' symbols. Mapping between both sides is not obvious. KBibTeX's guess work on the mapping has hopefully been improved now. Those changes have been integrated both into the 'master' code (not yet pushed at the time of writing) and a bugfix branch based on 'kbibtex/0.10'. The bugfix branch contains the minimum changes necessary to fix the bug, the 'master' changes include additional refactoring that does not belong into an almost-stable branch. So, please check the bugfix branch first. I would like to refine the commits for the 'master' based on the feedback I receive here: https://invent.kde.org/thomasfischer/kbibtex/commit/423a161dc5f44c8e7f0c873258dadc050f25acd6
Dear Bug Submitter, This bug has been in NEEDSINFO status with no change for at least 15 days. Please provide the requested information as soon as possible and set the bug status as REPORTED. Due to regular bug tracker maintenance, if the bug is still in NEEDSINFO status with no change in 30 days the bug will be closed as RESOLVED > WORKSFORME due to lack of needed information. For more information about our bug triaging procedures please read the wiki located here: https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging If you have already provided the requested information, please mark the bug as REPORTED so that the KDE team knows that the bug is ready to be confirmed. Thank you for helping us make KDE software even better for everyone!
Git commit 1e649222ed54060eb561fcc5b70568ba7f6098fb by Thomas Fischer. Committed on 23/12/2020 at 19:18. Pushed by thomasfischer into branch 'kbibtex/0.10'. Improving recognizing encoding of a BibTeX file to load Drawing from commits in the 'master' branch in order to improve recognizing BibTeX/BibLaTeX files' encoding. - 8e473758e99f30cf3d61fa0b1: Guessing file's encoding based on bit patterns - fba235cf5d0494b8189a1fca3: Refactoring FileImporterBibTeX FIXED-IN: 0.10 One effect is that opened files ASCII-only files are not directly classified as UTF-8, but stay ASCII-encoded as far as possible by classifying them as 'LaTeX'-encoded. M +2 -2 src/io/fileimporter.h M +177 -37 src/io/fileimporterbibtex.cpp M +2 -0 src/io/fileimporterbibtex.h https://invent.kde.org/office/kbibtex/commit/1e649222ed54060eb561fcc5b70568ba7f6098fb