SUMMARY It would be nice to have an option (or defaulting) to sorting the BibTeX output file by identifier and the keys as well. STEPS TO REPRODUCE 1. open an unsorted BibTeX file with KBibTeX 2. save it@comment{x-kbibtex-encoding=utf-8} OBSERVED RESULT BibTeX file maintains original unsorted state EXPECTED RESULT BibTeX file is sorted alphabetically by identifier and keys within an entry as well ADDITIONAL INFORMATION Benefits from this: - Easier diffing when tracking the BibTeX file - Consistency when working with multiple people on a BibTeX file - Takes the much of the arbitrariness away from the file formatting (same as strict code formatters like Python's black e.g.) I guess this is just a couple of lines of sorting in fileexporterbibtex.cpp but I got really confused browsing through the code as I don't know any Qt... 😔 I don't know if people might get annoyed if KBibTeX sorts their files, maybe a checkbox in the File Settings widget could help. We would need a way to persist this setting though, maybe a preamble like `@comment{x-kbibtex-encoding=utf-8}` for the file encoding could be used for that (e.g. `@comment{x-kbibtex-sorting=ascending|descending}`).
Wow seems you can't edit the bug description here... the second step-to-reproduce was obviously a copy-paste fail. 🤦
I agree that this is a useful feature. I was able to make a basic implementation, but I want to check it first with a fresh pair of eyes before pushing it to the Git repo. Regarding the source code, you were on the right track. To have a sorted output, one has to look into FileExporterBibTeX::saveAsString(.., const File *). There, I introduced const File *_bibtexfile = sortedByIdentifier ? File::sortByIdentifier(bibtexfile) : bibtexfile; ... and use a search&replace to make use of _bibtexfile instead of the original bibtexfile in the remainder of this function. File::sortByIdentifier(..) is a new function that basically employs std::sort(..) with a custom comparison function to get the desired result. To make it more useful, the code expands on FileImporterBibTeX::Private::Statistics to track if two subsequent entries' ids are in lexicographic order. When opening a BibTeX file, based on this counter it is guessed if this file is "sorted" or not. This information is tracked as a file property and then used when saving a file as described above. The property can also toggled in the per-file properties widget. Activating it will only have an effect when writing. Deactivation has no practical effect, there is no shuffling in this case ;-)
That sound absolutely awesome, thank you Thomas! Looking forward to trying it! 🙂 Am 12.10.22 um 22:27 schrieb Thomas Fischer: > https://bugs.kde.org/show_bug.cgi?id=460315 > > Thomas Fischer <fischer@unix-ag.uni-kl.de> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Ever confirmed|0 |1 > Status|REPORTED |ASSIGNED > > --- Comment #2 from Thomas Fischer <fischer@unix-ag.uni-kl.de> --- > I agree that this is a useful feature. I was able to make a basic > implementation, but I want to check it first with a fresh pair of eyes before > pushing it to the Git repo. > > Regarding the source code, you were on the right track. To have a sorted > output, one has to look into FileExporterBibTeX::saveAsString(.., const File > *). There, I introduced > const File *_bibtexfile = sortedByIdentifier ? > File::sortByIdentifier(bibtexfile) : bibtexfile; > ... and use a search&replace to make use of _bibtexfile instead of the original > bibtexfile in the remainder of this function. > File::sortByIdentifier(..) is a new function that basically employs > std::sort(..) with a custom comparison function to get the desired result. > > To make it more useful, the code expands on > FileImporterBibTeX::Private::Statistics to track if two subsequent entries' ids > are in lexicographic order. When opening a BibTeX file, based on this counter > it is guessed if this file is "sorted" or not. This information is tracked as a > file property and then used when saving a file as described above. > > The property can also toggled in the per-file properties widget. Activating it > will only have an effect when writing. Deactivation has no practical effect, > there is no shuffling in this case ;-) >
(In reply to nobodyinperson from comment #3) > That sound absolutely awesome, thank you Thomas! Looking forward to > trying it! 🙂 So, I had to fix some small issues here and there and added a test function as well. You can see the current commit here: https://invent.kde.org/thomasfischer/kbibtex/commit/658106cd9a5cb50320ab557f881a17c16da1b4a2 You can test the code by following the instructions here: https://userbase.kde.org/KBibTeX/Development#Quick_Start_to_Run_KBibTeX_from_Git with the adjusted script invocation: bash run-kbibtex.sh 'https://invent.kde.org/thomasfischer/kbibtex.git' 'bugs/kde460315'
Seems to work very well! Also very nice the fuzzy detection of the "sortedness" of a file! 👍 Thank you very much, Thomas!
(In reply to nobodyinperson from comment #5) > Seems to work very well! Also very nice the fuzzy detection of the > "sortedness" of a file! 👍 Thank you very much, Thomas! Nice to hear ☺ I guess that means I can merge the change and close this bug report.
Git commit eb510bdfdc46afaa520a9aa71c640c881ea6eaba by Thomas Fischer. Committed on 16/10/2022 at 12:27. Pushed by thomasfischer into branch 'master'. BibTeX file property: sort by identifier A new property can be set for BibTeX files: whether elements (foremost entries and macros) should be sorted by their identifier or key, respectively. This patch includes the following aspects to implement this feature: - when importing a BibTeX file, the Statistics struct counts how often two consecutive entries/macros in an imported file have identifiers/ keys in sorted order and how often not. Depending on the result, the SortedByIdentifier flag is set true or false. - when exporting a BibTeX file and the SortedByIdentifier flag is set, the File list will be copied and then sorted by identifier and key, respectively before writing to disk. - The file settings docklet and the properties dialog at "Save As" operations allow to set the SortedByIdentifier flag. - Sorting is implemented as a static function in class File, in case it will be used in other cases like other exporters in the future. - In KBibTeXDataTest, a new test ('sortFileByIdentifier') has been added to test the sorting functionality. Unsetting the SortedByIdentifier flag has no immediate consequence, there is no shuffling made. M +53 -1 src/config/preferences.cpp M +13 -1 src/config/preferences.h M +7 -0 src/config/preferences.json M +5 -0 src/data/element.cpp M +2 -0 src/data/element.h M +41 -0 src/data/file.cpp M +8 -0 src/data/file.h M +9 -0 src/gui/widgets/filesettingswidget.cpp M +1 -0 src/gui/widgets/filesettingswidget.h M +18 -5 src/io/fileexporterbibtex.cpp M +19 -3 src/io/fileimporterbibtex.cpp M +105 -0 src/test/kbibtexdatatest.cpp https://invent.kde.org/office/kbibtex/commit/eb510bdfdc46afaa520a9aa71c640c881ea6eaba
I forgot to run one test that needs manual configuration. It turns out, the threshold to recognize whether a partially sorted file is to be sorted when writing it to disk was too low. In file src/io/fileimporterbibtex.cpp at line 1297, I would like the multiplicator from 3 to 10. This basically means that are for a file to be recognized as sorted, it must be already somewhat sorted, but to a larger extend than with the previous value of 3. Please check if the value of 10 is still ok for you.
10 still works for me 👍
Git commit f3f7b16ac93eab91033a20f7b7a227ca98dfd187 by Thomas Fischer. Committed on 22/10/2022 at 20:52. Pushed by thomasfischer into branch 'master'. Making detection of sorting less sensitive Previous commit eb510bdfdc46afaa520a9 introduced a heuristic to detect whether entries and macros in a BibTeX file where (partially) sorted and then tried to impose a complete sorting on the file when writing back to disk. As it turned out, the chosen threshold for detecting sorting was too sensitive thus sorting bibliographies that are not meant to be (forcefully) sorted, e.g. to keep the difference between two Git commits small. This commit here decreases the sensitivity to a (still arbitrary) value that seems to be a good compromise that still detects (partially) sorted bibliographies, but does not errornously assess almost unsorted bibliographies as sorted. M +1 -1 src/io/fileimporterbibtex.cpp https://invent.kde.org/office/kbibtex/commit/f3f7b16ac93eab91033a20f7b7a227ca98dfd187