Bug 421612

Summary: When formatting IDs, removed „small words” still get counted, leading to unexpected results
Product: [Applications] KBibTeX Reporter: nobodyinperson <nobodyinperson>
Component: User interfaceAssignee: Thomas Fischer <fischer>
Status: RESOLVED FIXED    
Severity: normal    
Priority: NOR    
Version: 0.9.2   
Target Milestone: ---   
Platform: Manjaro   
OS: Linux   
Latest Commit: Version Fixed In: 0.10
Attachments: could-be-fix for id title word counting

Description nobodyinperson 2020-05-16 12:15:52 UTC
Created attachment 128510 [details]
could-be-fix for id title word counting

SUMMARY

KBibTeX can automatically format the IDs, a very handy feature. The possibility to define custom formats is also very useful. However, when removing „small words” (e.g. in the title), these removed words still get counted, leading to an unexpected and non-constant amount of words in the final id.

STEPS TO REPRODUCE
1. Add a new ID formatting scheme
2. As only field choose "Title"
3. Choose "First Word" to "Third Word"
4. Enable removing small words
5. Format an entry with title "The very important Title"

(even worse if choosing only "First Word" to "First Word" if one wants the first sensible word of the title)

OBSERVED RESULT

Id is formatted as "veryimportant"

(Id is formatted as empty string "")

EXPECTED RESULT

Id should be formatted as "veryimportanttitle"

(Id should be formatted as "very")


SOFTWARE/OS VERSIONS

up-to-date Manjaro with KBibTex 0.9.2 from the official repositories

ADDITIONAL INFORMATION

I couldn't compile it myself on the quick, but something along the lines in the attachment should result in the desired behaviour (only increasing the word count when a word was really appended), at least for the title.
Comment 1 Thomas Fischer 2020-05-16 20:56:09 UTC
I can confirm this problem and I think I have may have fixed it, but I cannot push it due to maintenance on KDE's Git servers. Will try again in a few days and then you can test and confirm if this problem has been fixed.
Comment 2 nobodyinperson 2020-07-01 05:20:48 UTC
Thanks for your work!

It seems KDE's migration to https://invent.gitlab.org is now officially live: https://dot.kde.org/2020/06/30/kdes-gitlab-now-live

Are you now able to push?
Comment 3 Thomas Fischer 2020-07-07 11:11:32 UTC
Ok, I pushed a potential fix to my personal clone of KBibTeX's Git repository:
https://invent.kde.org/thomasfischer/kbibtex/commit/5a35c183a3ed6c5a604aac2f1943db2ecfcf772d

To test the code:
0. Uninstall the distribution-provided KBibTeX installation (not sure if actually necessary; try if problems in following steps arise)
1. Get this script:  https://invent.kde.org/thomasfischer/kbibtex-related/-/raw/master/run/run-kbibtex.sh?inline=false
2. Run as:   bash run-kbibtex.sh https://invent.kde.org/thomasfischer/kbibtex.git bugs/kde421612

I hope all paths are set correctly after the migration to invent.kde.org
Comment 4 nobodyinperson 2020-07-07 12:17:50 UTC
It seems to work. I had to install a couple of dependencies unter Manjaro (qt5-networkauth, kdoctools and kate) to make it build. When run, the fonts look really ugly, but everything works. Small words are now not counted anymore, thanks.

Personally, I'd prefer all non-word-characters (dashes, etc...) being word separators, not only whitespace. But I guess that's another issue...

BTW, should one now post issues on GitLab? Or still here?
Comment 5 Thomas Fischer 2020-07-07 18:03:41 UTC
(In reply to nobodyinperson from comment #4)
> It seems to work. I had to install a couple of dependencies unter Manjaro
> (qt5-networkauth, kdoctools and kate) to make it build.
That is to be expected.

> When run, the fonts ok really ugly, but everything works.
Icons are likely not loading, either. Mostly a visual issue, the KBibTeX should still work, as you report.

> Small words are now not counted anymore, thanks.
Good. Then I will apply the patch for real.

> Personally, I'd prefer all non-word-characters (dashes, etc...) being word
> separators, not only whitespace. But I guess that's another issue...
Indeed. Please open a new issue about this. As I am not sure what you mean here, please provide some examples in your new report.

> BTW, should one now post issues on GitLab? Or still here?
AFAIK, bugs.kde.org is still the primary place to report bugs and request features. For example, at
 https://community.kde.org/Infrastructure/GitLab
is says:
"The KDE community does not generally use GitLab for bug reporting. Please continue to submit bug reports on https://bugs.kde.org."
Comment 6 Thomas Fischer 2020-07-07 18:03:58 UTC
Git commit 5a35c183a3ed6c5a604aac2f1943db2ecfcf772d by Thomas Fischer.
Committed on 07/07/2020 at 10:14.
Pushed by thomasfischer into branch 'kbibtex/0.10'.

When suggesting entry ids, do not count 'small words'

When generating entry ids based on title or journal title, a range of
words, such as from first to fourth word can be specified.
Before this commit, 'small words', despite being removed in the id
generation process, were counted when determining the first, second, ...
word.
If the suggestion template stated 'first word only' and the first word
was a 'small word' such as 'the', an empty suggestion may have been
generated.

M  +1    -0    ChangeLog
M  +3    -2    src/processing/idsuggestions.cpp

https://invent.kde.org/office/kbibtex/commit/5a35c183a3ed6c5a604aac2f1943db2ecfcf772d
Comment 7 Thomas Fischer 2020-07-07 18:06:00 UTC
Git commit 8c4212e4d00ba92f4d9aeebe35cd5393702c2c05 by Thomas Fischer.
Committed on 07/07/2020 at 18:00.
Pushed by thomasfischer into branch 'master'.

When suggesting entry ids, do not count 'small words'

When generating entry ids based on title or journal title, a range of
words, such as from first to fourth word can be specified.
Before this commit, 'small words', despite being removed in the id
generation process, were counted when determining the first, second, ...
word.
If the suggestion template stated 'first word only' and the first word
was a 'small word' such as 'the', an empty suggestion may have been
generated.

Forward-port of commit 5a35c183a3ed6c5a604aa from branch 'kbibtex/0.10'.

M  +1    -0    ChangeLog
M  +3    -2    src/processing/idsuggestions.cpp

https://invent.kde.org/office/kbibtex/commit/8c4212e4d00ba92f4d9aeebe35cd5393702c2c05